首页> 外文会议>International Conference on Similarity Search and Applications >Query Filtering with Low-Dimensional Local Embeddings
【24h】

Query Filtering with Low-Dimensional Local Embeddings

机译:具有低维局部嵌入的查询过滤

获取原文
获取外文期刊封面目录资料

摘要

The concept of local pivoting is to partition a metric space so that each element in the space is associated with precisely one of a fixed set of reference objects or pivots. The idea is that each object of the data set is associated with the reference object that is best suited to filter that particular object if it is not relevant to a query, maximising the probability of excluding it from a search. The notion does not in itself lead to a scalable search mechanism, but instead gives a good chance of exclusion based on a tiny memory footprint and a fast calculation. It is therefore most useful in contexts where main memory is at a premium, or in conjunction with another, scalable, mechanism. In this paper we apply similar reasoning to metric spaces which possess the four-point property, which notably include Euclidean, Cosine, Triangular, Jensen-Shannon, and Quadratic Form. In this case, each element of the space can be associated with two reference objects, and a four-point lower-bound property is used instead of the simple triangle inequality. The probability of exclusion is strictly greater than with simple local pivoting; the space required per object and the calculation are again tiny in relative terms. We show that the resulting mechanism can be very effective. A consequence of using the four-point property is that, for m reference points, there are (~m/_2) pivot pairs to choose from, giving a very good chance of a good selection being available from a small number of distance calculations. Finding the best pair has a quadratic cost with the number of references; however, we provide experimental evidence that good heuristics exist. Finally, we show how the resulting mechanism can be integrated with a more scalable technique to provide a very significant performance improvement, for a very small overhead in build-time and memory cost.
机译:局部旋转的概念是划分一个度量空间,以便空间中的每个元素都与一组固定的参考对象或枢轴中的一个精确关联。这种想法是,数据集的每个对象都与参考对象相关联,如果该对象与查询无关,则该参考对象最适合过滤该特定对象,从而最大程度地将其从搜索中排除。这个概念本身并不会导致可扩展的搜索机制,而是基于很小的内存占用和快速的计算而提供了排除的良好机会。因此,在主内存非常宝贵的情况下,或与其他可扩展机制结合使用时,它是最有用的。在本文中,我们对具有四点属性的度量空间应用相似的推理,其中特别包括欧几里得,余弦,三角,詹森-香农和二次形式。在这种情况下,空间的每个元素都可以与两个参考对象相关联,并且使用四点下界属性代替简单的三角形不等式。排除的可能性严格大于简单的局部透视。每个对象所需的空间和计算相对而言仍然很小。我们证明了所产生的机制可能非常有效。使用四点属性的结果是,对于m个参考点,有(〜m / _2)个枢轴对可供选择,从而极有可能从少量距离计算中获得良好的选择。找到最好的一对对与引用数量成正比的二次成本。但是,我们提供的实验证据表明存在良好的启发式方法。最后,我们展示了如何将生成的机制与可扩展性更高的技术集成在一起,从而以非常小的构建时间和内存成本开销提供显着的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号