【24h】

Solving approximate similarity queries

机译:解决近似相似性查询

获取原文
获取原文并翻译 | 示例
           

摘要

Supporting similarity search capabilities in data repositories helps satisfy user information needs rather than only user data needs like conventional DBMSs. This is desired for many modern database applications. However, as the data repository contains high-dimensional data, solutions to similarity search problem become cost-inefficient due to the so-called dimensionality curse. This phenomenon has been observed and shown that in high-dimensional data spaces the probability of overlaps between a query and data regions in a multidimensional access method (MAM) is very high. Hence, the execution of a similarity query may require accessing a vast number of the data regions and the performance of MAMs significantly decreases. Approximate similarity search has been introduced in order to lighten complexities of the problem. However, most research work done so far focuses mainly on approximate nearest neighbor (NN) and range queries in a single-feature data space. In practice, multiple-condition queries appear more frequently and get more complicated to deal with in whatever sense. In this article, we present effcient approaches to three types of approximate similarity queries: approximate multi-feature NN, approximate single-feature NN, and approximate range queries. Specially, we will use the Vague Query System, one among flexible query answering systems for conventional DBMSs, as a case study to illustrate and establish the practical value of our proposed solutions. Experimental results with both synthetic and real data sets will confirm the efficiency of these solutions.
机译:在数据存储库中支持相似性搜索功能有助于满足用户信息需求,而不仅仅是传统DBMS的用户数据需求。这是许多现代数据库应用程序所需要的。但是,由于数据存储库包含高维数据,由于所谓的维数诅咒,相似性搜索问题的解决方案变得成本低廉。已经观察到这种现象,并表明在多维数据访问方法(MAM)中,在高维数据空间中查询和数据区域之间发生重叠的可能性非常高。因此,执行相似性查询可能需要访问大量的数据区域,并且MAM的性能会大大降低。为了减轻问题的复杂性,引入了近似相似性搜索。但是,到目前为止,大多数研究工作主要集中在单特征数据空间中的近似最近邻(NN)和范围查询。实际上,无论如何,多条件查询的出现频率都更高,并且处理起来也更加复杂。在本文中,我们为三种类型的近似相似性查询提供了有效的方法:近似多特征NN,近似单特征NN和近似范围查询。特别地,我们将使用Vague查询系统(一种用于常规DBMS的灵活查询应答系统)作为案例研究,以说明并确定我们提出的解决方案的实用价值。综合和真实数据集的实验结果将证实这些解决方案的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号