首页> 外文期刊>Nucleic Acids Research >Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data
【24h】

Multi-class cancer classification by total principal component regression (TPCR) using microarray gene expression data

机译:使用微阵列基因表达数据通过总主成分回归(TPCR)进行多类癌症分类

获取原文
获取原文并翻译 | 示例
           

摘要

DNA microarray technology provides a promising approach to the diagnosis and prognosis of tumors on a genome-wide scale by monitoring the expression levels of thousands of genes simultaneously. One problem arising from the use of microarray data is the difficulty to analyze the high-dimensional gene expression data, typically with thousands of variables (genes) and much fewer observations (samples), in which severe collinearity is often observed. This makes it difficult to apply directly the classical statistical methods to investigate microarray data. In this paper, total principal component regression (TPCR) was proposed to classify human tumors by extracting the latent variable structure underlying microarray data from the augmented subspace of both independent variables and dependent variables. One of the salient features of our method is that it takes into account not only the latent variable structure but also the errors in the microarray gene expression profiles (independent variables). The prediction performance of TPCR was evaluated by both leave-one-out and leave-half-out cross-validation using four well-known microarray datasets. The stabilities and reliabilities of the classification models were further assessed by re-randomization and permutation studies. A fast kernel algorithm was applied to decrease the computation time dramatically. (MATLAB source code is available upon request.)
机译:通过同时监测数千种基因的表达水平,DNA微阵列技术为全基因组范围的肿瘤诊断和预后提供了一种有前途的方法。使用微阵列数据引起的一个问题是难以分析高维基因表达数据,通常具有数千个变量(基因),而观察值(样本)却少得多,其中经常观察到严重的共线性。这使得很难直接应用经典的统计方法来研究微阵列数据。在本文中,提出了总主成分回归(TPCR)通过从独立变量和因变量的增强子空间提取微阵列数据基础的潜在变量结构来对人类肿瘤进行分类。我们方法的显着特征之一是,它不仅考虑了潜在变量结构,而且还考虑了微阵列基因表达谱(独立变量)中的错误。使用四个著名的微阵列数据集,通过留一法和留半法交叉验证来评估TPCR的预测性能。通过重新随机化和置换研究进一步评估了分类模型的稳定性和可靠性。快速内核算法被应用来大大减少计算时间。 (可应要求提供MATLAB源代码。)

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号