首页> 外文会议>IEEE International Congress on Big Data >DD-Rtree: A dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms
【24h】

DD-Rtree: A dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms

机译:DD-Rtree:动态分布式数据结构,用于在集群节点之间进行有效的数据分配,以进行空间数据挖掘算法

获取原文

摘要

Parallelizing data mining algorithms has become a necessity as we try to mine ever increasing volumes of data. Spatial data mining algorithms like Dbscan, Optics, Slink, etc. have been parallelized to exploit a cluster infrastructure. The efficiency achieved by existing algorithms can be attributed to spatial locality preservation using spatial indexing structures like k-d-tree, quad-tree, grid files, etc. for distributing data among cluster nodes. However, these indexing structures are static in nature, i.e., they need to scan the entire dataset to determine the partitioning coordinates. This results in high data distribution cost when the data size is large. In this paper, we propose a dynamic distributed data structure, DD-Rtree, which preserves spatial locality while distributing data across compute nodes in a shared nothing environment. Moreover, DD-Rtree is dynamic, i.e., it can be constructed incrementally making it useful for handling big data. We compare the quality of data distribution achieved by DD-Rtree with one of the recent distributed indexing structure, SD-Rtree. We also compare the efficiency of queries supported by these indexing structures along with the overall efficiency of DBSCAN algorithm. Our experimental results show that DD-Rtree achieves better data distribution and thereby resulting in improved overall efficiency.
机译:当我们尝试挖掘不断增长的数据量时,并行化数据挖掘算法已成为必需。空间数据挖掘算法(如Dbscan,Optics,Slink等)已经并行化以利用群集基础结构。现有算法实现的效率可归因于使用空间索引结构(如k-d树,四叉树,网格文件等)在群集节点之间分配数据的空间局部性保存。但是,这些索引结构本质上是静态的,即它们需要扫描整个数据集以确定分区坐标。当数据大小较大时,这会导致较高的数据分发成本。在本文中,我们提出了一种动态分布式数据结构DD-Rtree,该结构可在无共享环境中跨计算节点分布数据时保留空间局部性。而且,DD-Rtree是动态的,即可以逐步构造它,从而对处理大数据很有用。我们将DD-Rtree与最近的分布式索引结构之一SD-Rtree所实现的数据分发质量进行了比较。我们还将比较这些索引结构支持的查询的效率以及DBSCAN算法的整体效率。我们的实验结果表明,DD-Rtree可以实现更好的数据分配,从而提高了总体效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号