首页> 外文期刊>Neural computing & applications >Cloud computing-based parallel genetic algorithm for gene selection in cancer classification
【24h】

Cloud computing-based parallel genetic algorithm for gene selection in cancer classification

机译:基于云计算的癌症分类基因选择的平行遗传算法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.
机译:癌症分类是患者愈合过程中的主要步骤之一。这一事实强制了现代临床研究人员利用先进的生物信息学方法进行癌症分类。通常使用微阵列实验和先进的机器学习方法中获得的基因表达数据进行癌症分类。微阵列实验产生大量数据,通过机器学习方法的处理代表了一个大挑战。在本研究中,利用了合并遗传算法特征选择和机器学习分类器的两步分类范例。遗传算法内置MapReduce编程精神,这使得该算法对于Hadoop集群具有高度可扩展的算法。为了提高所提出的算法的性能,它将延伸到一种并行算法,使用Hadoop MakReduce框架以分布式方式处理微阵列数据。在本文中,在11个Gems数据集(9个肿瘤,11个肿瘤,14个肿瘤,脑肿瘤1,肺癌,脑肿瘤2,白血病1,DLBCL,白血病2,SRBCT和前列腺肿瘤)上进行了测试精度达到100%,达到25个选定的特征。基于云计算的映射并行遗传算法良好对基因表达数据进行了良好。此外,由于底层Hadoop MapReduce平台,所建议算法的可扩展性是无限的。所提出的结果表明,可以有效地为云环境中的现实微阵列数据有效地实施所提出的方法。此外,Hadoop MapReduce框架表明计算时间的大量减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号