首页> 外文学位 >Applied spatial data structures for large data sets.
【24h】

Applied spatial data structures for large data sets.

机译:适用于大型数据集的空间数据结构。

获取原文
获取原文并翻译 | 示例

摘要

Spatial data structures for large data sets need to be particularly efficient. We introduce new data structures for two problems that involve large data sets.; We introduce Parameterized Balanced Aspect Ratio (PBAR) trees. They are instances of Binary Space Partition trees and to partition the space they accept any three partitioning vectors. All partitions are made with hyperplanes orthogonal to these vectors. Given a set of n points in 2-dimensional space, a PBAR tree can be constructed in O(n log n) time such that the depth of the tree is O (log n) and the aspect ratio of each region is bounded by a constant that only depends upon the angles between the partitioning vectors. Like previously introduced BAR trees, that too have bounded aspect ratio, PBAR trees have good upper bounds on the time taken to solve a number of approximate spatial queries like the (1 + ε)-nearest neighbor query and the ε-range query. However, since the bounds on their aspect ratio are better than those for BAR trees, they are expected to perform better at solving these queries. We present empirical results of tests with the (1 + ε)-nearest neighbor query that show PBAR trees outperforming BAR trees.; We also introduce a special form of k-d trees that are very fast in detecting outliers from large data sets in multidimensional spaces. Outliers are data objects that do not comply with the general behavior of the data. Our scheme does not assume any probability model and is linear in the number of objects and in the number of dimensions. It also does not assume the existence of a full dimensional distance metric in the input space. It is suitable for many applications including detecting outliers in astronomical databases—for which none of the existing schemes work. We present empirical results that show our scheme is, in fact, a practical solution.
机译:大型数据集的空间数据结构必须特别有效。我们针对涉及大数据集的两个问题引入了新的数据结构。我们介绍了参数化平衡长宽比(PBAR)树。它们是Binary Space Partition树的实例,并且为了划分空间,它们接受任何三个分区向量。所有分区均由与这些向量正交的超平面组成。给定二维空间中的一组 n 点,可以在 O n log n )时间,以使树的深度为 O (log n ),并且每个区域的纵横比由一个常数限制,该常数仅取决于之间的角度分区向量。像先前介绍的BAR树一样,它们也具有有界的长宽比,PBAR树在解决许多近似空间查询(例如(1 +&epsi))最近邻居查询和&epsi-range查询所花费的时间上具有良好的上限。 。但是,由于其宽高比的边界比BAR树的边界要好,因此期望它们在解决这些查询方面表现更好。我们用(1 +ε)-最近邻居查询给出的实验结果表明PBAR树优于BAR树。我们还介绍了 k -d树的一种特殊形式,该树可以非常快速地从多维空间中的大型数据集中检测异常值。离群值是不符合数据一般行为的数据对象。我们的方案不假设任何概率模型,并且在对象数量和维度数量上是线性的。它还不假定输入空间中存在一维距离度量。它适用于许多应用,包括在天文数据库中检测异常值,而现有方案均无法解决这些异常。我们提供的经验结果表明,我们的方案实际上是一种实际的解决方案。

著录项

  • 作者

    Chaudhary, Amitabh.;

  • 作者单位

    The Johns Hopkins University.;

  • 授予单位 The Johns Hopkins University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 85 p.
  • 总页数 85
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号