首页> 外文会议>International conference on intelligent computing >Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning
【24h】

Identifying Cancer Biomarkers from High-Throughput RNA Sequencing Data by Machine Learning

机译:通过机器学习从高通量RNA测序数据中识别癌症生物标志物

获取原文
获取外文期刊封面目录资料

摘要

In cancer progression, the expression level of relevant genes will change significantly in tumors comparing to their healthy counterparts. Therefore, the discovery of specific genes serving as biomarkers is of practical significance for diagnosis and prognosis. The available high-throughput '-omic' datasets provide unprecedented resources and opportunities of deriving cancer biomarkers, such as the public RNA-sequencing data generated by the Cancer Genome Atlas (TCGA) consortium. Here, we explore the identification of biomarker genes in 12 types of cancers from the classification effects in control and disease samples by machine learning. We firstly identify differentially expressed genes individually. Then, we implement feature selection by integrating recursive feature reduction and random forest classification with feature ranking. The final feature number will be determined via a parsimony principle that the features will be as few as possible, while they are still with the highest classification accuracy. In each cancer, the biomarker genes are then evaluated by tenfold cross-validations via several classification algorithms. We find extreme learning machine achieves the best classification performance when compared to the other methods. The further gene enrichment analyses indicate the dysfunctional and pathogenic mechanism in these identified biomarkers.
机译:在癌症的发展过程中,相关基因的表达水平与正常人相比,将在肿瘤中发生显着变化。因此,发现作为生物标志物的特定基因对于诊断和预后具有现实意义。可用的高通量“ -omic”数据集提供了获得癌症生物标志物的空前资源和机会,例如由癌症基因组图谱(TCGA)联盟产生的公共RNA测序数据。在这里,我们通过机器学习从对照和疾病样本中的分类效应中探索了12种类型的癌症中生物标志物基因的鉴定。我们首先鉴定出差异表达的基因。然后,我们通过将递归特征约简和随机森林分类与特征等级相结合来实现特征选择。最终特征号将通过简约原则确定,即特征将尽可能少,同时仍具有最高的分类精度。然后在每种癌症中,通过几种分类算法,通过十倍交叉验证来评估生物标志物基因。我们发现,与其他方法相比,极限学习机可实现最佳分类性能。进一步的基因富集分析表明这些已鉴定的生物标志物的功能障碍和致病机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号