首页> 外文OA文献 >Hadoop and Hive as Scalable Alternatives to RDBMS: A Case Study
【2h】

Hadoop and Hive as Scalable Alternatives to RDBMS: A Case Study

机译:Hadoop和Hive作为RDBMS的可扩展替代方案:一个案例研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

While high-performance, cost-effective data management solutions, such as Hadoop, exist for Big Data analysis, small and medium businesses with moderate-sized data sets would also like to implement low budget data management systems that will perform well on existing data and scale as the amount of accumulated data increases. Parallel database management systems may provide a high-performance solution, but are expensive and complex to implement. The purpose of this project was to compare the scalability of open-source relational database management systems and distributed data management systems for small and medium data sets. To make this comparison, a business intelligence case study was investigated using three data management solutions: MySQL, Hadoop MapReduce, and Hive. This experiment involved a payment history analysis which considers customer, account, and transaction data for predictive analytics. Experiments were executed on data sets ranging from 200MB to 10GB. The results show that the single server MySQL solution performs best for trial sizes ranging from 200MB to 1GB, but does not scale well beyond that. MapReduce outperforms MySQL on data sets larger than 1GB and Hive outperforms MySQL on sets larger than 2GB. This demonstrates MapReduce and Hive as viable techniques for small and medium businesses who want to implement scalable data management techniques.
机译:尽管存在用于大数据分析的高性能,经济高效的数据管理解决方案(例如Hadoop),但是具有中等大小数据集的中小型企业也希望实施低预算的数据管理系统,以对现有数据和随着累积数据量的增加而扩展。并行数据库管理系统可以提供高性能的解决方案,但是昂贵且实现复杂。该项目的目的是比较开源关系数据库管理系统和中小型数据集的分布式数据管理系统的可伸缩性。为了进行比较,研究了使用三种数据管理解决方案的商业智能案例研究:MySQL,Hadoop MapReduce和Hive。该实验涉及付款历史分析,该分析考虑了客户,帐户和交易数据以进行预测分析。实验是在200MB到10GB的数据集上执行的。结果表明,单服务器MySQL解决方案在200MB到1GB的试用大小下性能最佳,但扩展范围不大。 MapReduce在大于1GB的数据集上优于MySQL,而Hive在大于2GB的数据集上优于MySQL。这表明MapReduce和Hive是希望实施可伸缩数据管理技术的中小型企业的可行技术。

著录项

  • 作者

    Hollingsworth Marissa Rae;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号