...
首页> 外文期刊>Science China Life Sciences >Predicting potential cancer genes by integrating network properties, sequence features and functional annotations
【24h】

Predicting potential cancer genes by integrating network properties, sequence features and functional annotations

机译:通过整合网络特性,序列特征和功能注释来预测潜在的癌症基因

获取原文

摘要

The discovery of novel cancer genes is one of the main goals in cancer research. Bioinformatics methods can be used to accelerate cancer gene discovery, which may help in the understanding of cancer and the development of drug targets. In this paper, we describe a classifier to predict potential cancer genes that we have developed by integrating multiple biological evidence, including protein-protein interaction network properties, and sequence and functional features. We detected 55 features that were significantly different between cancer genes and non-cancer genes. Fourteen cancer-associated features were chosen to train the classifier. Four machine learning methods, logistic regression, support vector machines (SVMs), BayesNet and decision tree, were explored in the classifier models to distinguish cancer genes from non-cancer genes. The prediction power of the different models was evaluated by 5-fold cross-validation. The area under the receiver operating characteristic curve for logistic regression, SVM, Baysnet and J48 tree models was 0.834, 0.740, 0.800 and 0.782, respectively. Finally, the logistic regression classifier with multiple biological features was applied to the genes in the Entrez database, and 1976 cancer gene candidates were identified. We found that the integrated prediction model performed much better than the models based on the individual biological evidence, and the network and functional features had stronger powers than the sequence features in predicting cancer genes.
机译:新型癌症基因的发现是癌症研究的主要目标之一。生物信息学方法可用于加速癌症基因的发现,这可能有助于理解癌症和开发药物靶标。在本文中,我们描述了一种分类器,用于通过整合多种生物学证据(包括蛋白质-蛋白质相互作用网络特性以及序列和功能特征)来预测已开发的潜在癌症基因。我们检测到55个在癌基因和非癌基因之间有显着差异的特征。选择了十四种与癌症相关的特征来训练分类器。在分类器模型中探索了四种机器学习方法,逻辑回归,支持向量机(SVM),BayesNet和决策树,以区分癌症基因与非癌症基因。通过5倍交叉验证评估了不同模型的预测能力。用于逻辑回归,SVM,Baysnet和J48树模型的接收器工作特征曲线下的面积分别为0.834、0.740、0.800和0.782。最后,将具有多种生物学特征的逻辑回归分类器应用于Entrez数据库中的基因,并鉴定了1976个癌症基因候选者。我们发现,综合预测模型的性能要比基于单个生物学证据的模型好得多,并且网络和功能特征在预测癌症基因方面比序列特征具有更强大的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号