首页> 外文学位 >Improved query processing and data representation techniques.
【24h】

Improved query processing and data representation techniques.

机译:改进的查询处理和数据表示技术。

获取原文
获取原文并翻译 | 示例

摘要

This thesis presents new research on two topics: compression in relational database systems, and nearest neighbor processing techniques.; With the increasing speed of CPUs relative to disks, using compression as a means of improving disk information throughput is becoming very attractive. While it might at first seem reasonable to use traditional compression algorithms such as Lempel-Ziv, these algorithms are unsuitable because they require uncompressing a large portion of the file to retrieve a small piece of data.; Motivated by this observation, we have developed a compression algorithm for fixed length data that overcomes this problem. The algorithm is simple, and can easily be added to the file management layer of a DBMS since it supports the usual technique of identifying a record by a ⟨pageid, slotid ⟩ pair (tuple ID). In addition, this algorithm compresses hyper-rectangle based indexing structures such as R-Trees.; This thesis also describes the relationship between sort orders, multidimensional bulk loading, and compression ratios. Also included in this thesis is an extensive suite of experiments that show the applicability of our compression technique, especially for data warehousing workloads.; In recent years, many researchers have focused on finding efficient solutions to the nearest neighbor problem. While there have been many efforts to find faster than linear scan processing strategies for these tasks, there has been no success at solving the general problem.; While previous work shows that the problem can't be solved efficiently in general, we present a technique for either in memory or secondary storage that is guaranteed to perform well in the “good” situations previously described. Furthermore, it is the first strategy that allows a database administrator to trade off space for execution time. It is also the first nearest neighbor strategy whose behavior is completely understood. Finally, there are variants of the problem that perform far better than alternative techniques on “hard” cases.
机译:本文对两个主题提出了新的研究:关系数据库系统中的压缩和最近邻处理技术。随着CPU相对于磁盘的速度提高,使用压缩作为提高磁盘信息吞吐量的一种手段变得非常有吸引力。虽然起初似乎使用传统的压缩算法(如Lempel-Ziv)似乎很合理,但是这些算法不合适,因为它们需要解压缩文件的大部分才能检索一小部分数据。基于这种观察,我们开发了一种针对固定长度数据的压缩算法,可以解决该问题。该算法非常简单,并且可以轻松地添加到DBMS的文件管理层中,因为它支持通过〈 pageid,slotid 〉对(元组ID)来识别记录的常用技术。另外,该算法压缩基于超矩形的索引结构,例如R-Trees。本文还描述了排序顺序,多维批量加载和压缩率之间的关系。本文还包括一组广泛的实验,这些实验证明了我们的压缩技术的适用性,特别是对于数据仓库工作负载。近年来,许多研究人员致力于寻找最有效的解决最邻近问题的方法。尽管已经进行了许多努力来找到比线性扫描处理策略更快的速度来完成这些任务,但是在解决一般问题上没有成功。尽管先前的工作表明通常无法有效解决该问题,但我们提出了一种用于内存或辅助存储的技术,该技术可以保证在上述“良好”情况下表现良好。此外,这是允许数据库管理员在执行时间上权衡空间的第一种策略。它也是第一个完全了解其行为的最近邻居策略。最后,在“困难”情况下,问题的各种变体比替代技术要好得多。

著录项

  • 作者

    Goldstein, Jonathan David.;

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 1999
  • 页码 146 p.
  • 总页数 146
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号