首页> 外文会议>International Conference on Genome Informatics >IMPROVING GENE EXPRESSION CANCER MOLECULAR PATTERN DISCOVERY USING NONNEGATIVE PRINCIPAL COMPONENT ANALYSIS
【24h】

IMPROVING GENE EXPRESSION CANCER MOLECULAR PATTERN DISCOVERY USING NONNEGATIVE PRINCIPAL COMPONENT ANALYSIS

机译:采用非负原子成分分析改善基因表达癌分子模式发现

获取原文

摘要

Robust cancer molecular pattern identification from microarray data not only plays an essential role in modern clinic oncology, but also presents a challenge for statistical learning. Although principal component analysis (PCA) is a widely used feature selection algorithm in microarray analysis, its holistic mechanism prevents it from capturing the latent local data structure in the following cancer molecular pattern identification. In this study, we investigate the benefit of enforcing non-negativity constraints on principal component analysis (PCA) and propose a nonnegative principal component (NPCA) based classification algorithm in cancer molecular pattern analysis for gene expression data. This novel algorithm conducts classification by classifying meta-samples of input cancer data by support vector machines (SVM) or other classic supervised learning algorithms. The meta-samples are low-dimensional projections of original cancer samples in a purely additive meta-gene subspace generated from the NPCA-induced nonnegative matrix factorization (NMF). We report strongly leading classification results from NPCA-SVM algorithm in the cancer molecular pattern identification for five benchmark gene expression datasets under 100 trials of 50% hold-out cross validations and leave one out cross validations. We demonstrate superiority of NPCA-SVM algorithm by direct comparison with seven classification algorithms: SVM, PCA-SVM, KPCASVM, NMF-SVM, LLE-SVM, PCA-LDA and k-NN, for the five cancer datasets in classification rates, sensitivities and specificities. Our NPCA-S'M algorithm overcomes the over-fitting problem associative with SVM-based classifications for gene expression data under a Gaussian kernel. As a more robust high-performance classifier, NPCA-SVM can be used to replace the general SVM and k-NN classifiers in cancer biomarker discovery to capture more meaningful oncogenes.
机译:微阵列数据的强大癌症分子模式识别不仅在现代诊所肿瘤中发挥着重要作用,而且对统计学习呈现出挑战。尽管主成分分析(PCA)是微阵列分析中广泛使用的特征选择算法,但其整体机制防止其捕获以下癌症分子模式识别中的潜在局部数据结构。在这项研究中,我们研究了对主成分分析(PCA)实施非消极性约束的益处,并提出了基因表达数据的癌症分子模式分析中基于非负基础组分(NPCA)的分类算法。该新颖算法通过支持向量机(SVM)或其他经典监督学习算法来分类输入癌数据的元样本来进行分类。元样本是原始癌症样本的低维投影,以来自NPCA诱导的非负基质分子分解(NMF)产生的纯添加剂荟萃基因空间中。我们报告了在50%举起交叉验证的100个试验下的五个基准基因表达数据集中的癌症分子模式识别中强烈的癌症分类结果。我们通过与七种分类算法直接比较,证明了NPCA-SVM算法的优越性:SVM,PCA-SVM,KPCASVM,NMF-SVM,LLE-SVM,PCA-LDA和K-NN,用于分类率的五个癌症数据集,敏感性和特异性。我们的NPCA-S'M算法克服了基于SVM的基于SVM的分类的过拟合问题,用于高斯内核下的基因表达数据。作为一种更强大的高性能分类器,NPCA-SVM可用于替代癌症生物标志物发现中的一般SVM和K-NN分类剂,以捕获更有意义的癌症。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号