...
首页> 外文期刊>Current Bioinformatics >Feature Classification and Analysis of Lung Cancer Related Genes Through Gene Ontology and KEGG Pathways
【24h】

Feature Classification and Analysis of Lung Cancer Related Genes Through Gene Ontology and KEGG Pathways

机译:通过基因本体论和KEGG途径对肺癌相关基因进行特征分类和分析

获取原文
获取原文并翻译 | 示例
           

摘要

Characterization of cancer related genes is important and challenging in both biomedicine and computational biology. As one of the leading causes of cancer mortality worldwide, lung cancer accounts for over one million deaths each year. Generally, lung cancer can be assigned to small-cell lung cancer (SCLC) and non-small-cell lung cancer (NSCLC). Although great advances have been made in lung cancer detection and treatment, 5-year survival rate of patients is still less than 15%. Hence, it is very important to identify all the potential lung cancer related genes as well as their interaction networks. In this research, we presented a novel computational framework to predict lung cancer related genes based on support vector machine (SVM). 59 NSCLC related genes and 89 SCLC related genes were retrieved from KEGG pathways, while 2950 non-NSCLC and 4450 non-SCLC genes were randomly selected from Ensembl database. 10 datasets were constructed by dividing the genes into 10 groups. Each gene was encoded by a 13,126-dimensional vector comprised of 12,887 Gene Ontology enrichment scores and 239 KEGG enrichment scores. A feature extraction strategy was applied to obtain an optimal feature set including 400 GO terms and 47 KEGG pathways for NSCLC, 458 GO terms and 27 KEGG pathways for SCLC, respectively. Further feature analysis showed that these optimal features were actively involved in lung tumorigenesis. It also confirms that our method is an effective tool for predicting cancer related genes and has the potential to be applied extensively to the prediction of other types of cancer genes.
机译:癌症相关基因的表征在生物医学和计算生物学中都是重要且具有挑战性的。作为全球癌症死亡的主要原因之一,肺癌每年导致超过一百万的死亡。通常,肺癌可分为小细胞肺癌(SCLC)和非小细胞肺癌(NSCLC)。尽管在肺癌的检测和治疗方面已取得了很大的进步,但患者的5年生存率仍不到15%。因此,鉴定所有潜在的肺癌相关基因及其相互作用网络非常重要。在这项研究中,我们提出了一种基于支持向量机(SVM)预测肺癌相关基因的新颖计算框架。从KEGG途径中检索到59个NSCLC相关基因和89个SCLC相关基因,而从Ensembl数据库中随机选择了2950个非NSCLC基因和4450个非SCLC基因。通过将基因分为10组来构建10个数据集。每个基因由一个13,126维矢量编码,该矢量包含12,887个Gene Ontology富集得分和239个KEGG富集得分。应用特征提取策略来获得最佳特征集,其中包括针对NSCLC的400个GO项和47条KEGG通路,对于SCLC分别包括458个GO项和27条KEGG通路。进一步的特征分析表明,这些最佳特征积极参与了肺肿瘤的发生。这也证实了我们的方法是预测癌症相关基因的有效工具,并有可能广泛应用于其他类型癌症基因的预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号