首页> 外文学位 >Generalizaciones de minimos cuadrados parciales con aplicacion en clasificacion supervisada (Spanish text).
【24h】

Generalizaciones de minimos cuadrados parciales con aplicacion en clasificacion supervisada (Spanish text).

机译:偏最小二乘的推广及其在监督分类中的应用(西班牙语)。

获取原文
获取原文并翻译 | 示例

摘要

The development of technologies such as microarrays has generated a large amount of data. The main characteristic of this kind of data it is the large number of predictors (genes) and few observations (experiments). Thus, the data matrix X is of order n x p, where n is much smaller than p. Before using any multivariate statistical technique, such as regression and classification, to analyze the information contained in this data, we need to apply either feature selection methods and/or dimensionality reduction using orthogonal variables, in order to eliminate multicollineality among the predictor variables that can lead to severe prediction errors, as well as to a decrease of the computational burden required to build and validate the classifier.; Principal component analysis (PCA) is a technique that has being used for some time to reduce the dimensionality. However, the first components that have the most variability of the data structure do not necessarily improve the prediction when it is used for regression and classification (Yeung and Ruzzo, 2001). Partial least squares (PLS), introduced by Wold (1975), was an important contribution to reduce dimensionality in a regression context using orthogonal components. The certainty that first PLS components improve the prediction has made PLS a widely technique used particularly in the area of chemistry, known as Chemometrics. Nguyen and Rocke (2002), working on supervised classification methods for microarray data, reduced the dimensionality by applying first feature selection using statistical techniques such as difference of means and analysis of variance, after which they applied PLS regression considering the vector of classes (a categorical variable) as a response vector (continuous variable). This procedure is not adequate since the predictions are not necessarily integers and they must be rounded up, losing accuracy. In spite of these shortcomings, regression PLS yields reasonable results.; In this thesis work we implement generalizations of regression PLS as a dimensionality reduction technique to be applied in supervised classification. We extend a technique introduced by Bastien et al. (2002), who combined PLS with ordinal logistic regression for multiclass problems. However, since it is very uncommon to have ordered classes, in this work it has been combined PLS with nominal logistic regression. It was also considered the multivariate PLS along with logistic regression, as well as the construction of PLS components from linear discriminant analysis, and projection pursuit. The proposals presented in this thesis improve two recent results by Fort and Lambert (2004), and Ding and Gentleman (2004), combining logistic regression and PLS that are suitable only for datasets with two classes. A library of R functions was built to carry out the different proposals.
机译:诸如微阵列之类的技术的发展已经产生了大量数据。这种数据的主要特征是大量的预测变量(基因)和较少的观测值(实验)。因此,数据矩阵X的阶数为n x p,其中n远小于p。在使用任何多元统计技术(例如回归和分类)来分析此数据中包含的信息之前,我们需要应用特征选择方法和/或使用正交变量进行降维,以消除预测变量之间的多重共线性。导致严重的预测错误,并减少了建立和验证分类器所需的计算负担。主成分分析(PCA)是一种已使用一段时间以降低尺寸的技术。但是,当将数据结构用于变异和分类时,具有最大可变性的第一个组件并不一定会改善预测(Yeung和Ruzzo,2001年)。 Wold(1975)引入的偏最小二乘(PLS)是在使用正交分量的回归上下文中降低维数的重要贡献。 PLS最初的成分可以提高预测的确定性已使PLS成为一种广泛使用的技术,尤其是在化学领域,即化学计量学。 Nguyen和Rocke(2002)致力于微阵列数据的监督分类方法,通过使用统计技术(例如均值差和方差分析)应用第一个特征选择来降低维数,然后他们考虑类向量来应用PLS回归(a分类变量)作为响应向量(连续变量)。此过程并不足够,因为预测不一定是整数,并且必须将其四舍五入,从而失去准确性。尽管有这些缺点,回归PLS仍可得出合理的结果。在本文中,我们将回归PLS的推广作为一种降维技术应用于监督分类。我们扩展了由Bastien等人介绍的技术。 (2002年),他将PLS与序数逻辑回归相结合来解决多类问题。但是,由于有序类非常少见,因此在这项工作中将PLS与名义Logistic回归相结合。它也被认为是多元PLS以及logistic回归,以及从线性判别分析和投影追踪中构造PLS组件。本文提出的建议改进了Fort和Lambert(2004)以及Ding和Gentleman(2004)的两个最近的结果,它们将逻辑回归和PLS组合在一起,仅适用于两类数据集。建立了R函数库来执行不同的建议。

著录项

  • 作者

    Vega Vilca, Jose Carlos.;

  • 作者单位

    University of Puerto Rico, Mayaguez (Puerto Rico).;

  • 授予单位 University of Puerto Rico, Mayaguez (Puerto Rico).;
  • 学科 Computer Science.; Statistics.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 118 p.
  • 总页数 118
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;统计学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号