首页> 外文会议>Asia Pacific Web and Web-Age Information Management >Cuttle: Enabling Cross-Column Compression in Distributed Column Stores
【24h】

Cuttle: Enabling Cross-Column Compression in Distributed Column Stores

机译:SUMTLE:在分布式列商店中启用跨栏压缩

获取原文

摘要

We observe that, in real-world distributed data warehouse systems, data columns from different sources often exhibit redundancy. Even though these systems can employ both general and column-oriented compression schemes to reduce the data storage pressure, such cross-column redundancy (CCR) is not recognized or exploited effectively. Therefore, we propose Cuttle, a column storage system that enables cross-column compression to reduce CCR. Specifically, we identify three kinds of CCR and develop a referential transformation encoding (RTE) scheme to compress multiple columns of data with CCR. Furthermore, we address the CCR selection problem and propose a greedy algorithm to generate cross-column compression schemes. Our experiments on real-world datasets show that Cuttle can further reduce data size by half after applying both the column-oriented and general compression schemes, and that the query processing performance with Cuttle is improved by 20% without any change to the application programs.
机译:我们观察到,在现实世界分布式数据仓库系统中,来自不同源的数据列通常呈现冗余。尽管这些系统可以采用一般和面向列的压缩方案来降低数据存储压力,但这种跨栏冗余(CCR)不会有效地识别或利用。因此,我们提出了一种Cultle,一个列存储系统,使跨柱压缩能够减少CCR。具体而言,我们识别三种CCR并开发参考转换编码(RTE)方案以使用CCR压缩多列数据列。此外,我们解决了CCR选择问题,并提出了一种贪婪的算法来生成跨柱压缩方案。我们对现实世界数据集的实验表明,在应用面向列和一般压缩方案的情况下,南风可以进一步减少数据大小,并在窗口的查询处理性能提高20%而没有任何改变应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号