首页> 外文会议>Data Engineering (ICDE), 2010 >K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free
【24h】

K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free

机译:免费(大型)关系数据库中的K个最近邻居查询和kNN-Joins

获取原文

摘要

Finding the k nearest neighbors (kNN) of a query point, or a set of query points (kNN-Join) are fundamental problems in many application domains. Many previous efforts to solve these problems focused on spatial databases or stand-alone systems, where changes to the database engine may be required, which may limit their application on large data sets that are stored in a relational database management system. Furthermore, these methods may not automatically optimize kNN queries or kNN-Joins when additional query conditions are specified. In this work, we study both the kNN query and the kNN-Join in a relational database, possibly augmented with additional query conditions. We search for relational algorithms that require no changes to the database engine. The straightforward solution uses the user-defined-function (UDF) that a query optimizer cannot optimize.We design algorithms that could be implemented by SQL operators without changes to the database engine, hence enabling the query optimizer to understand and generate the ¿best¿ query plan. Using only a small constant number of random shifts for databases in any fixed dimension, our approach guarantees to find the approximate kNN with only logarithmic number of page accesses in expectation with a constant approximation ratio and it could be extended to find the exact kNN efficiently in any fixed dimension. Our design paradigm easily supports the kNN-Join and updates. Extensive experiments on large, real and synthetic, data sets confirm the efficiency and practicality of our approach.
机译:查找查询点的k个最近邻居(kNN)或一组查询点(kNN-Join)是许多应用程序领域中的基本问题。解决这些问题的许多以前的努力都集中在空间数据库或独立系统上,其中可能需要更改数据库引擎,这可能会限制它们在关系数据库管理系统中存储的大型数据集上的应用。此外,当指定其他查询条件时,这些方法可能不会自动优化kNN查询或kNN-Joins。在这项工作中,我们在关系数据库中研究了kNN查询和kNN-Join,可能还会增加其他查询条件。我们搜索不需要更改数据库引擎的关系算法。直截了当的解决方案使用查询优化器无法优化的用户定义函数(UDF)。我们设计了可以由SQL运算符实现的算法,而无需更改数据库引擎,从而使查询优化器能够理解并生成Â最佳查询计划。对于任何固定维度的数据库,仅使用少量恒定的数据库随机移位,我们的方法就可以保证仅以对数的访问量以期望的近似比率找到仅具有页面访问次数的对数的近似kNN,并且可以扩展为有效地找到精确的kNN。任何固定尺寸。我们的设计范例轻松支持kNN-Join和更新。在大型,真实和综合数据集上进行的大量实验证实了我们方法的有效性和实用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号