首页> 外文期刊>Advances in data analysis and classification >A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification
【24h】

A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

机译:高维微阵列数据分类中最佳基因选择的两阶段稀疏逻辑回归

获取原文
获取原文并翻译 | 示例
           

摘要

The common issues of high-dimensional gene expression data are that many of the genes may not be relevant, and there exists a high correlation among genes. Gene selection has been proven to be an effective way to improve the results of many classification methods. Sparse logistic regression using least absolute shrinkage and selection operator (lasso) or using smoothly clipped absolute deviation is one of the most widely applicable methods in cancer classification for gene selection. However, this method faces a critical challenge in practical applications when there are high correlations among genes. To address this problem, a two-stage sparse logistic regression is proposed, with the aim of obtaining an efficient subset of genes with high classification capabilities by combining the screening approach as a filter method and adaptive lasso with a new weight as an embedded method. In the first stage, sure independence screening method as a screening approach retains those genes representing high individual correlation with the cancer class level. In the second stage, the adaptive lasso with new weight is implemented to address the existence of high correlations among the screened genes in the first stage. Experimental results based on four publicly available gene expression datasets have shown that the proposed method significantly outperforms three state-of-the-art methods in terms of classification accuracy, G-mean, area under the curve, and stability. In addition, the results demonstrate that the top selected genes are biologically related to the cancer type. Thus, the proposed method can be useful for cancer classification using DNA gene expression data in real clinical practice.
机译:高维基因表达数据的常见问题是许多基因可能不相关,并且基因之间存在高相关。基因选择已被证明是改善许多分类方法结果的有效方法。使用最不绝对收缩和选择操作员(套索)或使用平滑剪裁的绝对偏差是基因选择的癌症分类中最广泛适用的方法之一。然而,当基因之间存在高相关时,该方法面临实际应用中的临界挑战。为了解决这个问题,提出了一种两阶段稀疏的逻辑回归,目的是通过将筛选方法与作为嵌入方法的新重量的筛选方法和自适应套索相结合来获得具有高分类能力的有效基因子集。在第一阶段,确定独立筛选方法作为筛选方法保留了与癌症类别水平具有高单独相关性的那些基因。在第二阶段,实施具有新重量的自适应套索以解决第一阶段中筛选基因之间的高相关的存在。基于四个公开可用的基因表达数据集的实验结果表明,该方法在曲线下的分类精度,G均值,面积和稳定性下显着优于三种最先进的方法。此外,结果表明,顶部选定的基因在生物学上与癌症类型有关。因此,所提出的方法可用于在真正的临床实践中使用DNA基因表达数据的癌症分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号