首页> 外文学位 >Temporal Join Processing with Hilbert Curve Space Mapping.
【24h】

Temporal Join Processing with Hilbert Curve Space Mapping.

机译:希尔伯特曲线空间映射的时间连接处理。

获取原文
获取原文并翻译 | 示例

摘要

Management of data with a time dimension increases the overhead of storage and query processing in large database applications especially with the join operation, which is a commonly used and expensive relational operator whose processing is dependent on the size of the input relations. An index-based approach has been shown to improve the processing of a join operation, which in turn, improves the performance of querying historical data. Temporal data consist of tuples associated with a time interval value having a valid life span of different lengths. With join processing on temporal data, since tuples with longer life spans tend to overlap a greater number of joining tuples, they are likely to be accessed more often. The efficient performance of a temporal join depending on index-clustered data is the main theme studied and researched in this work. The presence of intervals having an extended data range in temporal data makes the join evaluation harder because temporal data are intrinsically multidimensional.;Some temporal join processing methods create duplicates of tuples with long life spans to achieve clustering of similar data, which improves the performance on tuples that tend to be accessed more frequently. The proposed Hilbert-Temporal Join (Hilbert-TJ) join algorithm overcomes the need of data duplication by mapping temporal data into Hilbert curve space that is inherently clustered, thus allowing for fast retrieval and storage. A balanced B+ tree index structure was implemented to manage and query the data. The query method identifies data pages containing matching tuples that intersect a multidimensional region. Given that data pages consist of contiguously mapped points on the curve, the query process successively traverses along the curve to determine the next page that intersects the query region by iteratively partitioning the data space. The proposed Adaptive Replacement Cache-Temporal Data (ARC-TD) buffer replacement policy is built upon the Adaptive Replacement Cache (ARC) policy by favoring the cache retention of data pages in proportion to the average life span of the tuples in the buffer. By giving preference to tuples having long life spans, a higher cache hit ratio was evident. The caching priority is also balanced between recently and frequently accessed data.;An evaluation and comparison study of the proposed Hilbert-TJ algorithm determined the relative performance with respect to a nested-loop join, a sort-merge join, and a partition-based join algorithm that use a multiversion B+ tree (MVBT) index. The metrics are based on a comparison between the processing time (disk I/O time plus CPU time), cache hit ratio, and index storage size needed to perform the temporal join. The study was conducted with comparisons in terms of the Least Recently Used (LRU), Least Frequently Used (LFU), ARC, and the new ARC-TD buffer replacement policy. Under the given conditions, the expected outcome was that by reducing data redundancy and considering the longevity of frequently accessed temporal data, better performance was achieved. Additionally, the Hilbert-TJ algorithm offers support to both valid-time and transaction-time data.
机译:具有时间维度的数据管理会增加大型数据库应用程序中存储和查询处理的开销,尤其是使用join操作时,join操作是一种常用且昂贵的关系运算符,其运算取决于输入关系的大小。已经显示了基于索引的方法可以改善联接操作的处理,从而提高查询历史数据的性能。时间数据由与时间间隔值关联的元组组成,该时间间隔值具有不同长度的有效寿命。在对时间数据进行联接处理时,由于寿命较长的元组往往会与大量的联接元组重叠,因此它们有可能会被更频繁地访问。依赖于索引聚类数据的时间联接的有效性能是这项工作中研究和研究的主题。由于时态数据本质上是多维的,因此在时态数据中存在具有扩展数据范围的间隔会使联接评估变得更加困难。;某些时态联接处理方法会创建具有较长寿命的元组副本,以实现相似数据的聚类,从而提高了性能。元组倾向于更频繁地访问。提出的希尔伯特-时态联接(Hilbert-TJ)联接算法通过将时间数据映射到固有聚类的希尔伯特曲线空间中,从而克服了数据复制的需求,从而允许快速检索和存储。实现了平衡的B +树索引结构来管理和查询数据。查询方法标识包含与多维区域相交的匹配元组的数据页。假定数据页由曲线上的连续映射点组成,则查询过程将依次沿曲线遍历,以通过迭代地划分数据空间来确定与查询区域相交的下一页。提议的自适应替换高速缓存-时态数据(ARC-TD)缓冲区替换策略是在自适应替换高速缓存(ARC)策略的基础上构建的,通过与缓冲区中元组的平均寿命成比例地支持数据页的高速缓存保留。通过优先选择寿命长的元组,显然可以提高缓存命中率。缓存优先级也在最近访问的数据和频繁访问的数据之间达到平衡。对提议的Hilbert-TJ算法的评估和比较研究确定了嵌套循环联接,排序合并联接和基于分区的相对性能使用多版本B +树(MVBT)索引的连接算法。指标基于处理时间(磁盘I / O时间加上CPU时间),高速缓存命中率和执行临时联接所需的索引存储大小之间的比较。该研究是根据最近最少使用(LRU),最少经常使用(LFU),ARC和新的ARC-TD缓冲区替换策略进行比较的。在给定条件下,预期结果是通过减少数据冗余并考虑经常访问的时间数据的寿命,可以实现更好的性能。此外,Hilbert-TJ算法还支持有效时间和事务时间数据。

著录项

  • 作者

    Raigoza, Jaime A.;

  • 作者单位

    Nova Southeastern University.;

  • 授予单位 Nova Southeastern University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 153 p.
  • 总页数 153
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:41:08

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号