首页> 外文会议>International Conference on Data Engineering >Coarse indices for a tape-based data warehouse
【24h】

Coarse indices for a tape-based data warehouse

机译:基于磁带的数据仓库的粗略指数

获取原文

摘要

Data warehouses allow users to make sense of large quantities of detail data. While most queries can be answered through summary data, some queries can only be answered by accessing the detail data. It is usually not cost-effective to store terabytes of detail data online; instead, the detail data is stored on tape. The problem we address in this paper is how to index tape-based detail data. Conventional indices on tens of terabytes of data can require terabytes of storage themselves. We propose the use of coarse indices for tape-based detail data. Instead of specifying all locations of a record containing a particular key, the coarse index specifies whether or not a region of tape contains at least one record with a particular key value. Our proposal is based on the observation that while long tape seeks are fast, short tape seeks are slow. Therefore, indices that point to the exact record location on tape do not provide performance benefits to justify the cost of their storage. A few bits pointing to an appropriate location are enough. In this paper, we present the design of such a coarse index, and provide fast algorithms for its updating and querying. Our experiments on a large data set taken from an existing data warehouse show that using compressed bitmap indices offer an order-of-magnitude reduction in index size, permitting the online storage of the coarse indices. Analytical and simulation models of the time to fetch selected records from tape show that using coarse indices almost always improves reduces the total loading time as compared to using dense tape-based indices or to using no index at all.
机译:数据仓库允许用户了解大量详细数据。虽然大多数查询都可以通过摘要数据回答,但只能通过访问详细数据来回答一些查询。在线存储Terabytes,通常不会具有成本效益;相反,详细数据存储在磁带上。我们在本文中地址的问题是如何索引基于磁带的细节数据。常规指数上的数十岁的数据可能需要自己的储存。我们建议使用基于磁带的细节数据的粗索引力。代替指定包含特定密钥的记录的所有位置,粗略索引指定磁带区域是否包含具有特定键值的至少一个记录。我们的提议基于观察结果,而长磁带寻求快速,短磁带寻求缓慢。因此,指向磁带上确切记录位置的指标不提供性能优势,以证明其存储的成本。几个指向适当位置的位就足够了。在本文中,我们介绍了这种粗略索引的设计,并为其更新和查询提供了快速算法。我们在从现有数据仓库中获取的大型数据集的实验表明,使用压缩位图指数提供索引大小的级别减少,允许粗略指数的在线存储。分析和仿真模型从磁带中获取所选记录的时间显示,使用粗略索引几乎总是提高,与使用密集的磁带基指数或根本不使用索引来减少总加载时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号