首页> 外文期刊>Journal of theoretical & computational chemistry >Adaptive lasso with weights based on normalized filtering scores in molecular big data
【24h】

Adaptive lasso with weights based on normalized filtering scores in molecular big data

机译:基于分子大数据中的标准化过滤分数的重量自适应套索

获取原文
获取原文并翻译 | 示例
           

摘要

The molecular big data are highly correlated, and numerous genes are not related. The various classification methods performance mainly rely on the selection of significant genes. Sparse regularized regression (SRR) models using the least absolute shrinkage and selection operator (lasso) and adaptive lasso (alasso) are popularly used for gene selection and classification. Nevertheless, it becomes challenging when the genes are highly correlated. Here, we propose a modified adaptive lasso with weights using the ranking-based feature selection (RFS) methods capable of dealing with the highly correlated gene expression data. Firstly, an RFS methods such as Fisher's score (FS), Chi-square (CS), and information gain (IG) are employed to ignore the unimportant genes and the top significant genes are chosen through sure independence screening (SIS) criteria. The scores of the ranked genes are normalized and assigned as proposed weights to the alasso method to obtain the most significant genes that were proven to be biologically related to the cancer type and helped in attaining higher classification performance. With the synthetic data and real application of microarray data, we demonstrated that the proposed alasso method with RFS methods is a better approach than the other known methods such as alasso with filtering such as ridge and marginal maximum likelihood estimation (MMLE), lasso and alasso without filtering. The metrics of accuracy, area under the receiver operating characteristics curve (AUROC), and geometric mean (GM-mean) are used for evaluating the performance of the models.
机译:分子大数据具有高度相关性,并且许多基因无关。各种分类方法的性能主要依赖于选择性基因的选择。使用最低的绝对收缩和选择操作员(套索)和自适应套索(Alasso)的稀疏正规回归(SRR)模型普遍用于基因选择和分类。然而,当基因高度相关时,它变得挑战。这里,我们使用能够处理高度相关的基因表达数据的基于排名的特征选择(RFS)方法提出了一种改进的自适应套索。首先,使用诸如Fisher的评分(FS),Chi-Square(CS)和信息增益(Ig)之类的RFS方法来忽略不重要的基因,并且通过肯定独立筛选(SIS)标准来选择顶部显着基因。排名基因的分数被标准化并分配为亚拉多方法的提出的重量,以获得最重要的基因被证明是与癌症类型生物学相关的并且有助于获得更高的分类性能。通过合成数据和微阵列数据的实际应用,我们证明了具有RFS方法的提议的亚拉多方法是比其他已知方法更好的方法,例如亚拉索,如脊和边缘最大似然估计(MMLE),套索和亚拉多没有过滤。准确度,接收器操作特性曲线(AUROC)下的区域和几何平均值(GM-均值)的指标用于评估模型的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号