首页> 外文期刊>Journal of Parallel and Distributed Computing >A Parallel Multilevel Feature Selection algorithm for improved cancer classification
【24h】

A Parallel Multilevel Feature Selection algorithm for improved cancer classification

机译:一种平行的多级特征选择算法,改进癌症分类

获取原文
获取原文并翻译 | 示例

摘要

Biological data is prone to grow exponentially, which consumes more resources, time and manpower. Parallelization of algorithms could reduce overall execution time. There are two main challenges in parallelizing computational methods. (1) Biological data is multi-dimensional in nature. (2). Parallel algorithms reduce execution time, but with the penalty of reduced prediction accuracy. This research paper targets these two issues and proposes the following approaches. (1) Vertical partitioning of data along feature space and horizontal partitioning along samples in order to ease the task of data parallelism. (2) Parallel Multilevel Feature Selection (M-FS) algorithm to select optimal and important features for improved classification of cancer sub-types. The selected features are evaluated using parallel Random Forest on Spark, compared with previously reported results and also with the results of sequential execution of same algorithms. The proposed parallel M-FS algorithm was compared with existing parallel feature selection algorithms in terms of accuracy and execution time. The results reveal that parallel multilevel feature selection algorithm improved cancer classification resulting into prediction accuracy ranging from ~85% to ~99% with very high speed up in terms of seconds. On the other hand, existing sequential algorithms yielded prediction accuracy of ~65% to ~99% with execution time of more than 24 hours.
机译:生物数据易于呈指数增长,这消耗了更多的资源,时间和人力。算法的并行化可以减少整体执行时间。并行化计算方法存在两个主要挑战。 (1)生物数据本质上是多维的。 (2)。并行算法减少了执行时间,但是通过降低预测准确性的惩罚。本研究文件针对这两个问题并提出了以下方法。 (1)沿着特征空间的垂直划分数据和沿着样本的水平分区,以便缓解数据并行性的任务。 (2)并行多级特征选择(M-FS)算法选择最佳和重要特征,以改善癌症子类型的分类。与先前报告的结果相比,使用并行随机林评估所选特征,以及同一算法的顺序执行结果。在准确度和执行时间方面将所提出的并行M-FS算法与现有的并行特征选择算法进行比较。结果表明,并行多级特征选择算法改善了癌症分类,导致预测精度范围从〜85%到〜99%的预测精度,在几秒钟内非常高。另一方面,现有的顺序算法产生预测精度〜65%至〜99%,执行时间超过24小时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号