【24h】

A Novel Dimensionality Reduction Technique Based on Independent Component Analysis for Modeling Microarray Gene Expression Data

机译:基于独立成分分析的降维技术在微阵列基因表达数据建模中的应用

获取原文
获取原文并翻译 | 示例

摘要

DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. But one challenge of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usually in thousands. In statistical terms this very large number of predictors compared to a small number of samples or observations makes the classification problem difficult. This is known as the "curse of dimensionality problem". An efficient way to solve this problem is by using dimensionality reduction techniques. Principle Component Analysis(PCA) is a leading method for dimensionality reduction of gene expression data which is optimal in the sense of least square error. In this paper we propose a new dimensionality reduction technique for specific bioinformatics applications based on Independent component Analysis(ICA). Being able to exploit higher order statistics to identify a linear model result, this ICA based dimensionality reduction technique outperforms PCA from both statistical and biological significance aspects. We present experiments on NCI 60 dataset to show this result.
机译:产生数千种基因表达测量值的DNA微阵列实验被用于从组织和细胞样本中收集有关基因表达差异的信息,这些信息可用于诊断疾病。但是微阵列研究的一个挑战是这样一个事实,即与每个样品中通常有数千个基因的基因数量p相比,收集到的样品数量n相对较小。用统计学的话来说,与少量样本或观察值相比,这种非常大量的预测变量使分类问题变得困难。这就是所谓的“维数问题”。解决此问题的有效方法是使用降维技术。主成分分析(PCA)是减少基因表达数据降维的一种领先方法,该方法在最小平方误差的意义上是最佳的。在本文中,我们提出了一种基于独立成分分析(ICA)的针对特定生物信息学应用的降维技术。这种基于ICA的降维技术能够利用高阶统计量来识别线性模型结果,无论是从统计意义上还是从生物学意义上来说,其性能均优于PCA。我们在NCI 60数据集上进行实验以显示此结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号