首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >C2Net: A Network-Efficient Approach to Collision Counting LSH Similarity Join
【24h】

C2Net: A Network-Efficient Approach to Collision Counting LSH Similarity Join

机译:C2Net:一种用于冲突计数LSH相似性连接的网络有效方法

获取原文
获取原文并翻译 | 示例

摘要

Similarity join of two datasets P and Q is a primitive operation that is useful in many application domains. The operation involves identifying pairs (p,q) in the Cartesian product of P and Q such that (p,q) satisfies a stipulated similarity condition. In a high-dimensional space, an approximate similarity join based on locality-sensitive hashing (LSH) provides a good solution while reducing the processing cost with a predictable loss of accuracy. A distributed processing framework such as MapReduce allows the handling of large and high-dimensional datasets. However, network cost estimation frequently turns into a bottleneck in a distributed processing environment, thus resulting in a challenge of achieving faster and more efficient similarity join. This paper focuses on collision counting LSH-based similarity join in MapReduce and proposes a network-efficient solution called C2Net to improve the utilization of MapReduce combiners. The solution uses two graph partitioning schemes: (i) minimum spanning tree for organizing LSH buckets replication; and (ii) spectral clustering for runtime collision counting task scheduling. Experiments have shown that, in comparison to the state of the art, the proposed solution is able to achieve 20 percent data reduction and 50 percent reduction in shuffle time.
机译:两个数据集P和Q的相似性联接是一种原始操作,在许多应用程序域中都非常有用。该操作涉及识别P和Q的笛卡尔积中的对(p,q),以使(p,q)满足规定的相似性条件。在高维空间中,基于局部敏感哈希(LSH)的近似相似性联接提供了一个很好的解决方案,同时降低了处理成本,并具有可预测的准确性损失。诸如MapReduce之类的分布式处理框架允许处理大型和高维数据集。但是,网络成本估算通常会成为分布式处理环境中的瓶颈,从而带来实现更快,更有效的相似性联接的挑战。本文着重于MapReduce中基于冲突计数LSH的相似性联接,并提出了一种网络有效的解决方案C2Net,以提高MapReduce组合器的利用率。该解决方案使用两种图分区方案:(i)用于组织LSH桶复制的最小生成树; (ii)频谱聚类,用于运行时冲突计数任务调度。实验表明,与现有技术相比,提出的解决方案能够实现20%的数据减少和50%的随机播放时间减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号