首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies
【24h】

Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies

机译:研究非线性降维方案在基因和蛋白质表达研究分类中的功效

获取原文
获取原文并翻译 | 示例

摘要

The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the ''curse of dimensionality'', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.
机译:最近用于癌症诊断的高维基因和蛋白质表达谱数据集的采购和可用性出现爆炸式增长,因此需要开发用于分析它们的复杂机器学习工具。准确分类这些高维数据集的能力的主要限制源于“维数诅咒”,这种情况发生在基因或肽的数量大大超过患者样本总数的情况下。以前处理此问题的尝试主要集中在使用降维(DR)方案主成分分析(PCA)来获得高维数据的低维投影中。但是,依赖于欧几里得距离来估计对象相似性的线性PCA和其他线性DR方法,并未考虑与大多数生物医学数据相关的固有的潜在非线性结构。这项工作的目的是要确定用于分析高维基因和蛋白质表达研究的合适的DR方法。为此,我们以经验和严格方式比较三种非线性(Isomap,局部线性嵌入,拉普拉斯特征图)和三种线性DR方案(PCA,线性判别分析,多维缩放),目的是确定单个对象的简化子空间表示形式类更容易区分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号