【24h】

Optimized Data Placement for Column-Oriented Data Store in the Distributed Environment

机译:分布式环境中面向列的数据存储的优化数据放置

获取原文

摘要

Column-oriented data storage becomes a buzzword nowadays for its high efficiency in massive data access, high compression ratio on individual columns and etc. However, the initial observations turn out to not be trivially true. The seek time and bandwidth of current hard disk drivers (HDD) become the bottleneck for massive data processing day by day, when comparing to other component enhancements of computers during the past four decades. In this paper, we provide a novel data placement strategy for massive data analysis (i.e., read-optimized) based on Gray Code, which enhances the ratio of sequential access to a great extent for diverse query evaluations (e.g., range query, partial match range query, aggregation query and etc). A centralized/distributed structured index is employed in the popularly deployed distributed file systems (e.g., GFS), which achieves the convenient management, efficient accessibility, high extendibility and etc. Detailed theoretical analysis on index extendibility, sequential access improvement and storage capacity usage in terms of proposed data placement strategies are provided as well as specific algorithms. Our extensive experimental studies confirm the efficiency and effectiveness of our proposed data placement methods.
机译:面向列的数据存储如今因其在海量数据访问中的高效率,单个列的高压缩率等而成为流行语。然而,最初的观察结果并不完全成立。与过去四十年中计算机的其他增强功能相比,当前硬盘驱动器(HDD)的查找时间和带宽日益成为海量数据处理的瓶颈。在本文中,我们为基于格雷码的海量数据分析(即读取优化)提供了一种新颖的数据放置策略,从而在很大程度上提高了顺序访问的比率,以进行各种查询评估(例如范围查询,部分匹配)范围查询,聚合查询等)。在普遍部署的分布式文件系统(例如GFS)中采用集中式/分布式结构化索引,实现了便捷的管理,高效的可访问性,高可扩展性等。对索引可扩展性,顺序访问改进和存储容量使用进行了详细的理论分析。提供了建议的数据放置策略的术语以及特定的算法。我们广泛的实验研究证实了我们提出的数据放置方法的效率和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号