首页> 外文会议>IEEE International Conference on Data Mining >Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search
【24h】

Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search

机译:菱形采样近似最大全对DOT-MARES(MAD)搜索

获取原文

摘要

Given two sets of vectors, A = { a_1,..., a_m} and B = { b_1,..., b_n}, our problem is to find the top-t dot products, i.e., the largest |a_i · b_j| among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach that avoids direct computation of all mn dot products. We select diamonds (i.e., four-cycles) from the weighted tripartite representation of A and B. The probability of selecting a diamond corresponding to pair (i, j) is proportional to (a_i · b_j)~2, amplifying the focus on the largest-magnitude entries. Experimental results indicate that diamond sampling is orders of magnitude faster than direct computation and requires far fewer samples than any competing approach. We also apply diamond sampling to the special case of maximum inner product search, and get significantly better results than the state-of-the-art hashing methods.
机译:给定两个矢量集,A = {A_1,...,A_M}和B = {B_1,......,B_N},我们的问题是要找到前t个点产品,也就是说,最大的| A_I·b_j |在所有可能的对。这是出现在涉及相似性搜索,链接的预测和协同过滤大量数据的应用程序的基本数学问题。我们提出了一个基于采样的办法,避免了所有的MN点产品直接计算。我们选择从A的加权三方表示和B.选择对应于一对钻石的概率钻石(即四个周期)(I,J)是正比于(A_I·b_j)〜2,放大的焦点上最大幅度的条目。实验结果表明,钻石采样数量级比直接计算速度更快,需要比任何竞争的做法样本少得多。我们也采用钻石采样最大内商品搜索的特殊情况,并获得比国家的最先进的哈希方法显著更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号