首页> 外文期刊>Data & Knowledge Engineering >The Merkurion approach for similarity searching optimization in Database Management Systems
【24h】

The Merkurion approach for similarity searching optimization in Database Management Systems

机译:数据库管理系统中用于相似性搜索优化的Merkurion方法

获取原文
获取原文并翻译 | 示例
           

摘要

Modern Database Management Systems (DBMSs) retrieve songs that resemble those in a music dataset, identify plagiarism in a set of documents, or provide past cases to physicians by taking into account the characteristics of a query exam. All such tasks require the comparison of data by similarity, which can be expressed in terms of distance-based queries in metric spaces. Traditional query processing relies mostly on histograms for describing the data distribution space and choosing a data retrieval path that quickly leads to the answer, discarding comparisons of most unwanted data. However, DBMSs still lack adequate support for selectivity estimation of query operators for data types embedded in metric spaces. This article addresses a novel strategy that extends the query optimizer of a DBMS, so that it can also perform both logical and physical query plan optimizations in searches that include similarity predicates. The proposal, named Merkurion, updates the concept of Data Distribution Space and captures data distributions according to the distances between the elements within a dataset. Moreover, it employs concise representations of such distributions, called synopses, for the definition of rules that enable similarity searching optimization. An extensive evaluation of Merkurion in real world datasets has proven its effectiveness and broad applicability to many data domains.
机译:现代数据库管理系统(DBMS)检索与音乐数据集中的歌曲相似的歌曲,在一组文档中识别窃,或通过考虑查询考试的特征将过去的案例提供给医生。所有这些任务都需要通过相似性来比较数据,这可以用度量空间中基于距离的查询来表示。传统的查询处理主要依赖于直方图来描述数据分布空间,并选择一条数据检索路径以快速找到答案,从而放弃对大多数不需要数据的比较。但是,DBMS仍然缺乏足够的支持来支持对查询运算符的选择性估计,以评估嵌入在度量空间中的数据类型。本文介绍了一种新颖的策略,该策略扩展了DBMS的查询优化器,因此它还可以在包括相似谓词的搜索中执行逻辑和物理查询计划优化。该提案名为Merkurion,它更新了数据分发空间的概念,并根据数据集中元素之间的距离捕获数据分发。此外,它使用这种分布的简明表示法(称为概要)来定义启用相似性搜索优化的规则。 Merkurion在现实世界数据集中的广泛评估已证明其有效性和对许多数据域的广泛适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号