首页> 外文会议>International conference on similarity search and applications >The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries
【24h】

The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries

机译:距离分布的力量:质量控制相似性查询的成本模型和调度策略

获取原文

摘要

Approximate similarity queries are a practical way to obtain good, yet suboptimal, results from large data sets without having to pay high execution costs. In this paper we analyze the problem of understanding how the strategy for searching through an index tree, also called scheduling policy, can influence costs. We consider quality-controlled similarity queries, in which the user sets a quality (distance) threshold 8 and the system halts as soon as it finds k objects in the data set at distance < 6 from the query object. After providing experimental evidence that the scheduling policy might indeed have a high impact on paid costs, we characterize the policies' behavior through an analytical cost model, in which a major role is played by parameterized local distance distributions. Such distributions are also the key to derive new scheduling policies, which we show to be optimal in a simplified, yet relevant, scenario.
机译:近似相似性查询是获取大数据集的良好,又次优,结果的实用方法,而无需支付高执行成本。在本文中,我们分析了了解如何通过索引树搜索的策略,也称为调度策略,可以影响成本。我们考虑质量控制的相似性查询,其中用户设置质量(距离)阈值8,并且一旦它发现在查询对象的距离<6处的数据中找到k个对象,系统会停止系统停止。在提供实验证据后,调度政策可能确实对付费成本具有很高的影响,我们通过分析成本模型表征了政策的行为,其中通过参数化的本地距离分布扮演了主要作用。此类分布也是导出新的调度策略的关键,我们在简化但相关的方案中显示为最佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号