首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage
【24h】

Biomarker Identification and Cancer Classification Based on Microarray Data Using Laplace Naive Bayes Model with Mean Shrinkage

机译:基于拉普拉斯朴素贝叶斯模型的平均数据基于微阵列数据的生物标志物识别和癌症分类

获取原文
获取原文并翻译 | 示例

摘要

Biomarker identification and cancer classification are two closely related problems. In gene expression data sets, the correlation between genes can be high when they share the same biological pathway. Moreover, the gene expression data sets may contain outliers due to either chemical or electrical reasons. A good gene selection method should take group effects into account and be robust to outliers. In this paper, we propose a Laplace naive Bayes model with mean shrinkage (LNB-MS). The Laplace distribution instead of the normal distribution is used as the conditional distribution of the samples for the reasons that it is less sensitive to outliers and has been applied in many fields. The key technique is the L_1 penalty imposed on the mean of each class to achieve automatic feature selection. The objective function of the proposed model is a piecewise linear function with respect to the mean of each class, of which the optimal value can be evaluated at the breakpoints simply. An efficient algorithm is designed to estimate the parameters in the model. A new strategy that uses the number of selected features to control the regularization parameter is introduced. Experimental results on simulated data sets and 17 publicly available cancer data sets attest to the accuracy, sparsity, efficiency, and robustness of the proposed algorithm. Many biomarkers identified with our method have been verified in biochemical or biomedical research. The analysis of biological and functional correlation of the genes based on Gene Ontology (GO) terms shows that the proposed method guarantees the selection of highly correlated genes simultaneously.
机译:生物标志物识别和癌症分类是两个密切相关的问题。在基因表达数据集中,当基因共享相同的生物途径时,它们之间的相关性可能很高。此外,由于化学或电气原因,基因表达数据集可能包含异常值。好的基因选择方法应考虑到群体效应,并且对异常值具有鲁棒性。在本文中,我们提出了具有平均收缩率(LNB-MS)的Laplace朴素贝叶斯模型。拉普拉斯分布代替正态分布用作样本的条件分布,原因是它对异常值的敏感性较低,并且已在许多领域中得到应用。关键技术是对每个类别的平均值施加L_1惩罚以实现自动特征选择。提出的模型的目标函数是相对于每个类别的平均值的分段线性函数,可以在断点处简单地评估其最佳值。设计了一种有效的算法来估计模型中的参数。引入了一种新策略,该策略使用所选特征的数量来控制正则化参数。在模拟数据集和17个公开可用的癌症数据集上的实验结果证明了该算法的准确性,稀疏性,效率和鲁棒性。用我们的方法鉴定的许多生物标记物已在生化或生物医学研究中得到验证。基于基因本体论(GO)术语的基因生物学和功能相关性分析表明,该方法可以保证同时选择高度相关的基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号