【24h】

A three-dimensional data model in HBase for large time-series dataset analysis

机译:HBase中用于大型时间序列数据集分析的三维数据模型

获取原文
获取原文并翻译 | 示例

摘要

In the transition of applications from the traditional enterprise infrastructures to cloud infrastructures, scalable database management system plays an important role in efficiently managing and analysing unprecedented massive amount of data. Compared to RDBMSs, NoSQL databases, are more attractive in addressing this challenge. However, it is not easy to manage data in NoSQL database effectively for non-expert users because of the rare data-organization support. A poor data organization may accidentally abuse the features of NoSQL database and achieve unsatisfactory performance. Therefore, a systematic method for NoSQL database data-schema design is a timely and important problem for researchers and practitioners. HBase, as a particular NoSQL database offering, relies (a) on HDFS, for its distributed and replicated storage, and (b) on coprocessors, for efficient parallel query processing. To harness the potential parallelism benefits, an appropriate partitioning of the data across the HBase storage is required. we investigate the effectiveness of the three-dimensional data model, which uses the “version” dimension of HBase to store the values of a data item over time. We have experimented and evaluated the performance impact of this type of data model with two data sets, of different sizes and different time lengths. For each of these data sets, we have compared the performance of several ad-hoc queries, implemented with HBase Coprocessors framework, across different data schemas, some of which (do not) use the third HBase dimension. The experiment results demonstrate improved performance with the data schemas that use the third dimension of HBase.
机译:在应用程序从传统企业基础架构过渡到云基础架构的过程中,可扩展的数据库管理系统在有效管理和分析前所未有的海量数据方面发挥着重要作用。与RDBMS相比,NoSQL数据库在应对这一挑战方面更具吸引力。但是,由于难得的数据组织支持,对于非专家用户而言,要在NoSQL数据库中有效地管理数据并不容易。不良的数据组织可能会意外地滥用NoSQL数据库的功能,并导致性能不令人满意。因此,对于NoSQL数据库数据模式设计来说,系统的方法对于研究人员和从业人员而言是一个及时而重要的问题。作为特殊的NoSQL数据库产品,HBase(a)依靠HDFS进行分布式存储和复制存储,(b)依靠协处理器进行有效的并行查询处理。为了利用潜在的并行性优势,需要跨HBase存储对数据进行适当的分区。我们研究了三维数据模型的有效性,该模型使用HBase的“版本”维来存储一段时间内数据项的值。我们已经使用两个具有不同大小和不同时间长度的数据集对这种类型的数据模型的性能影响进行了实验和评估。对于这些数据集中的每个数据集,我们已经比较了使用HBase协处理器框架在不同数据模式之间进行的几个即席查询的性能,其中某些数据模式(不使用)使用了第三个HBase维。实验结果证明,使用使用HBase三维的数据模式可提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号