首页> 外文期刊>Studies in Informatics and Control >Parallelized Classification of Cancer Sub-types from Gene Expression Profiles Using Recursive Gene Selection
【24h】

Parallelized Classification of Cancer Sub-types from Gene Expression Profiles Using Recursive Gene Selection

机译:使用递归基因选择从基因表达谱对癌症亚型进行并行分类

获取原文
获取原文并翻译 | 示例

摘要

Cancer is a chronic disease that is caused mainly by irregularities in genes. It is important to identify such oncogenes that cause cancer. Biological data like gene expressions, protein sequences, RNA-sequences, pathway analysis, Pan-cancer analysis and structural biomarkers could aid in cancer diagnosis, classification and prognosis. This research focuses on classifying subtypes of cancer using Microarray Gene Expression (MGE) levels. Nature of MGE data is multidimensional with very few samples. It is necessary to perform dimensionality reduction to select the relevant genes and remove the redundant ones. The Recursive Feature Selection (RFS) method is proposed as it repeatedly performs the gene selection process until the best gene subset is found. The obtained best subset of genes is further employed for classification using different models and evaluated using 10-fold cross-validation. In order to scale for huge amount of gene expression data, the parallelized classification model was explored on the Spark framework. A comparison was drawn between the non-parallelized classification model on Weka and the parallelized classification model on Spark. The results revealed that the parallelized classification model performs better than non-parallelized classification model in terms of accuracy and execution time. Further, the performance of RFS and parallelized classifier was also compared with previous approaches. The proposed RFS and parallelized classifier outperformed previous methods.
机译:癌症是一种慢性疾病,主要由基因异常引起。识别导致癌症的致癌基因很重要。基因表达,蛋白质序列,RNA序列,途径分析,泛癌分析和结构生物标志物等生物学数据可有助于癌症的诊断,分类和预后。这项研究的重点是使用微阵列基因表达(MGE)水平对癌症亚型进行分类。 MGE数据的性质是多维的,只有很少的样本。有必要进行降维以选择相关基因并去除冗余基因。提出了递归特征选择(RFS)方法,因为它反复执行基因选择过程,直到找到最佳基因子集为止。将获得的最佳基因子集进一步用于使用不同模型的分类,并使用10倍交叉验证进行评估。为了扩展海量的基因表达数据,在Spark框架上探索了并行分类模型。在Weka上的非并行分类模型和Spark上的并行分类模型之间进行了比较。结果表明,在准确性和执行时间方面,并行分类模型的性能优于非并行分类模型。此外,还将RFS和并行分类器的性能与以前的方法进行了比较。提出的RFS和并行分类器优于以前的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号