首页> 外文期刊>Geoinformatica: An international journal of advances of computer science for geographic >Generalized communication cost efficient multi-way spatial join: revisiting the curse of the last reducer
【24h】

Generalized communication cost efficient multi-way spatial join: revisiting the curse of the last reducer

机译:广义通信成本高效的多路空间连接:重新审视最后还原器的诅咒

获取原文
获取原文并翻译 | 示例
           

摘要

With the huge increase in usage of smart mobiles, social media and sensors, large volumes of location-based data is available. Location based data carries important signals pertaining to user intensive information as well as population characteristics. The key analytical tool for location based analysis is multi-way spatial join. Unlike the conventional join strategies, multi-way join using map-reduce offers a scalable, distributed computational paradigm and efficient implementation through communication cost reduction strategies. Controlled Replicate (C-Rep) is a useful strategy used in the literature to perform the multi-way spatial join efficiently. Though C-Rep performance is superior compared to naive sequential join, careful analysis of its performance reveals that such a strategy is plagued by the curse of the last reducer, wherein the skew inherently present in the datasets and the skew introduced by replication operation, causes some of the reducers to take much longer time compared to others. In this work, we design an algorithm GEMS (Generalized Communication costEfficientMulti-WaySpatial Join) to address the skewness inherent in the connectivity of spatial objects while performing a multi-way join. We analysed all the algorithms concerned, in terms of I/O and communication costs. We prove that the communication cost of GEMS approach is better than that of C-Rep by a factor O(alpha) where alpha is the number of reducers in a single row/column of a grid of reducers. Our experimental results on different datasets indicate that GEMS approach is three times superior(in terms of turn around time) compared to C-Rep.
机译:随着智能手机,社交媒体和传感器的使用巨大增加,可提供大量基于位置的数据。基于位置的数据携带与用户密集型信息有关的重要信号以及人口特征。基于位置的分析工具的分析工具是多路空间连接。与传统的连接策略不同,使用Map-Demult的多种方式通过通信成本降低策略提供可扩展,分布的计算范例和有效的实现。受控复制(C-rep)是在文献中使用的有用策略,以有效地执行多路空间连接。虽然C-Rep性能与天真顺序连接相比,但对其性能的仔细分析表明,这种策略是由最后还原器的诅咒困扰,其中在数据集中固有存在的歪斜和通过复制操作引入的歪斜与他人相比,一些减速器需要更长的时间。在这项工作中,我们设计了一种算法GEMS(广义通信高次要的方法,以解决空间对象连接中固有的偏置,同时执行多种方式连接。我们在I / O和通信成本方面分析了所有有关的算法。我们证明了Gems方法的通信成本优于C-Rep的因子O(alpha)的通信成本,其中alpha是减速器网格的单行/柱中的减速器的数量。我们在不同数据集上的实验结果表明,与C-rep相比,Gems方法是高于上级的三倍(随时的时间)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号