首页> 外文学位 >Managing large multidimensional datasets inside a database system.
【24h】

Managing large multidimensional datasets inside a database system.

机译:在数据库系统内管理大型多维数据集。

获取原文
获取原文并翻译 | 示例

摘要

This thesis develops techniques to manage large amounts of multidimensional data inside a database system. To be able to handle multidimensional data efficiently, we need access methods (AMs) to selectively access some data items in a large collection associatively. Commercial databases lag far behind in their support for multidimensional access methods. In this thesis, we design and implement the hybrid tree, a multidimensional index structure that scales to high dimensional spaces. The hybrid tree combines the positive aspects of the two types of multidimensional index structures, namely data partitioning (e.g., R-tree and derivatives) and space partitioning (e.g., kdB-tree and derivatives), to achieve search performance more scalable to high dimensionalities than either of the above techniques. Our experiments show that the hybrid tree scales well to high dimensionalities for real-life datasets.; To achieve further scalability, we develop the local dimensionality reduction (LDR) technique to reduce the dimensionality of high dimensional data. LDR exploits local, as opposed to global, correlations in the data and hence can reduce dimensionality with significantly lower loss of distance information compared to global dimensionality reduction techniques This implies fewer false positives and hence better search performance.; To enable efficient similarity search on time series data, we develop a dimensionality reduction technique, called Adaptive Piecewise Constant Approximation (APCA), for time series data. APCA adapts locally to each time series object in the database and chooses the best reduced-representation for that object. We show how the APCA representation can be indexed using a multidimensional index structure. Our experiments show that APCA outperforms the other techniques by one to two orders of magnitude in terms of search performance.; Before multidimensional index structures can be supported as AMs in “commercial-strength” database systems, efficient techniques to provide transactional access to data via the index structure must be developed. We develop concurrency control techniques for multidimensional index structures.; To handle the huge data volumes and fast response time requirements in decision support applications, we develop an approximate query processing technique based on multidimensional wavelets. Our technique constructs compact synopses (comprising of wavelet coefficients) of the relevant database tables and subsequently answers any SQL query by working exclusively on the compact synopses. Our approach provides more accurate answers and faster response times compared to other approximate query answering techniques.
机译:本文提出了在数据库系统内部管理大量多维数据的技术。为了能够有效处理多维数据,我们需要访问方法(AM)来选择性地关联地访问大型集合中的某些数据项。商业数据库在支持多维访问方法方面远远落后。在本文中,我们设计并实现了混合树,它是一种可扩展到高维空间的多维索引结构。混合树结合了两种类型的多维索引结构的积极方面,即数据分区(例如R树和派生类)和空间分区(例如kdB树和派生类),以实现对高维度更具可扩展性的搜索性能而不是以上任何一种技术。我们的实验表明,对于真实的数据集,混合树可以很好地缩放到高维。为了实现进一步的可伸缩性,我们开发了局部降维(LDR)技术来降低高维数据的维数。 LDR利用数据的局部而不是全局的相关性,因此与全局降维技术相比,LDR可以降低维数,而距离信息的损失则大大降低。为了对时间序列数据进行有效的相似性搜索,我们针对时间序列数据开发了一种称为“自适应分段常数逼近”(APCA)的降维技术。 APCA在本地适应数据库中的每个时间序列对象,并为该对象选择最佳的简化表示形式。我们展示了如何使用多维索引结构对APCA表示进行索引。我们的实验表明,APCA在搜索性能方面优于其他技术一到两个数量级。在将多维索引结构作为“商业实力”数据库系统中的AM支持之前,必须开发有效的技术来通过索引结构提供对数据的事务性访问。我们开发用于多维索引结构的并发控制技术。为了处理决策支持应用程序中的巨大数据量和快速响应时间要求,我们开发了基于多维小波的近似查询处理技术。我们的技术构造了相关数据库表的紧凑型大纲(包含小波系数),随后通过专门处理紧凑型大纲来回答任何SQL查询。与其他近似查询回答技术相比,我们的方法提供了更准确的答案和更快的响应时间。

著录项

  • 作者

    Chakrabarti, Kaushik.;

  • 作者单位

    University of Illinois at Urbana-Champaign.;

  • 授予单位 University of Illinois at Urbana-Champaign.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 169 p.
  • 总页数 169
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号