...
首页> 外文期刊>International Journal of Applied Engineering Research >Apache Spark and Hadoop Based Big Data Processing System for Clinical Research
【24h】

Apache Spark and Hadoop Based Big Data Processing System for Clinical Research

机译:基于Apache Spark和Hadoop的临床研究大数据处理系统

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Usage of big data which is related to medical filed is gaining popularity among healthcare services and for clinical research. Medical field is one of the largest areas which is generating enormous amount and varieties of data. Traditional systems are incapable of handling such big data which is characterized by volume, variety, velocity, veracity and values (5 V's). To process this vast amount of data we need a framework which can parallel process the data by utilizing the clusters of commodity hardware. This hardware should be reliable, fault-tolerant. Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. In the Hadoop framework we can develop MapReduce applications which can scale up from single node to thousands of machines. This paper investigates the big data which is used in clinical research to find out the patients with similar patterns and recommend the patients who requires intensive care. Also, the patients can be informed about the future predictions. In this paper we propose a ten-node hadoop cluster to run the distributed mapreduce algorithms. This algorithm shows an efficient data processing with big clinical data. These results can be used to provide efficient and personalized decisions for the patients. The data sets used for the results purpose is taken from MIMIC-III an open source database which is one of the largest repositories of data.
机译:与医疗提交有关的大数据的使用是在医疗服务和临床研究中获得普及。医疗领域是最大的区域之一,它产生了巨大数量和各种数据。传统系统无法处理这些大数据,该数据具有体积,品种,速度,准确性和值(5 V')。要处理此大量数据,我们需要一个框架,可以通过利用商品硬件集群并行处理数据。该硬件应可靠,容错容错。 Apache Spark是一个快速的内存数据处理引擎,具有优雅和富有富有富有的开发API,允许数据工作人员有效地执行需要快速迭代访问数据集的流,机器学习或SQL工作负载。在Hadoop框架中,我们可以开发MapReduce应用程序,该应用程序可以从单节点到数千台机器扩展。本文研究了临床研究中使用的大数据,以了解有类似模式的患者,并推荐需要重症监护的患者。此外,患者可以了解未来的预测。在本文中,我们提出了一个十个节点Hadoop集群来运行分布式MapReduce算法。该算法显示了具有大临床数据的有效数据处理。这些结果可用于为患者提供有效和个性化的决策。用于结果目的的数据集是从MIMIC-III的开源数据库中获取,该数据库是最大数据存储库之一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号