首页> 外文会议>2nd IEEE International Conference on Cloud Computing Technology and Science >Performance Considerations of Data Acquisition in Hadoop System
【24h】

Performance Considerations of Data Acquisition in Hadoop System

机译:Hadoop系统中数据采集的性能注意事项

获取原文

摘要

Data have become more and more important these years, especially for big companies, and it is of great benefit to mine useful information in these data. Oil & Gas industry has to deal with vast amounts of data, both in real-time and historical context. As the amount of data is significant, it is usually infeasible or very time consuming to actually process the data. In our project we investigate usage of Hadoop to solve this problem. In order to perform Hadoop jobs, data must first exist in the Hadoop file system, which creates the problem of data acquisition. In this paper, two solutions are investigates, performance comparison is performed and solution based on Chukwa is demonstrated to be more efficient than a naïve implementation in particular for bigger file sizes.
机译:这些年来,数据变得越来越重要,尤其是对于大公司而言,在这些数据中挖掘有用的信息将带来极大的好处。石油和天然气行业必须在实时和历史背景下处理大量数据。由于数据量很大,因此实际处理数据通常不可行或非常耗时。在我们的项目中,我们研究了使用Hadoop解决此问题的方法。为了执行Hadoop作业,数据必须首先存在于Hadoop文件系统中,这会导致数据获取问题。本文研究了两种解决方案,进行了性能比较,并证明了基于Chukwa的解决方案比单纯的实现更有效,特别是对于较大的文件大小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号