首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Disease Gene Prediction by Integrating PPI Networks, Clinical RNA-Seq Data and OMIM Data
【24h】

Disease Gene Prediction by Integrating PPI Networks, Clinical RNA-Seq Data and OMIM Data

机译:通过整合PPI网络,临床RNA-Seq数据和OMIM数据进行疾病基因预测

获取原文
获取原文并翻译 | 示例
           

摘要

Disease gene prediction is a challenging task that has a variety of applications such as early diagnosis and drug development. The existing machine learning methods suffer from the imbalanced sample issue because the number of known disease genes (positive samples) is much less than that of unknown genes which are typically considered to be negative samples. In addition, most methods have not utilized clinical data from patients with a specific disease to predict disease genes. In this study, we propose a disease gene prediction algorithm (called dgSeq) by combining protein-protein interaction (PPI) network, clinical RNA-Seq data, and Online Mendelian Inheritance in Man (OMIN) data. Our dgSeq constructs differential networks based on rewiring information calculated from clinical RNA-Seq data. To select balanced sets of non-disease genes (negative samples), a disease-gene network is also constructed from OMIM data. After features are extracted from the PPI networks and differential networks, the logistic regression classifiers are trained. Our dgSeq obtains AUC values of 0.88, 0.83, and 0.80 for identifying breast cancer genes, thyroid cancer genes, and Alzheimer's disease genes, respectively, which indicates its superiority to other three competing methods. Both gene set enrichment analysis and predicted results demonstrate that dgSeq can effectively predict new disease genes.
机译:疾病基因预测是一项具有挑战性的任务,具有多种应用程序,例如早期诊断和药物开发。现有的机器学习方法存在样本不平衡的问题,因为已知疾病基因(阳性样本)的数量远远少于未知基因(通常被视为阴性样本)的数量。另外,大多数方法还没有利用来自患有特定疾病的患者的临床数据来预测疾病基因。在这项研究中,我们通过结合蛋白质-蛋白质相互作用(PPI)网络,临床RNA-Seq数据和在线孟德尔男性遗传(OMIN)数据,提出了一种疾病基因预测算法(称为dgSeq)。我们的dgSeq根据从临床RNA-Seq数据计算出的重新接线信息构建差异网络。为了选择平衡的非疾病基因集(阴性样本),还可以从OMIM数据中构建疾病基因网络。从PPI网络和差分网络中提取特征后,将对逻辑回归分类器进行训练。我们的dgSeq分别获得0.88、0.83和0.80的AUC值,以分别识别乳腺癌基因,甲状腺癌基因和阿尔茨海默氏病基因,这表明它比其他三种竞争方法优越。基因集富集分析和预测结果均表明dgSeq可以有效预测新的疾病基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号