首页> 外文会议>International conference on algorithms and architectures for parallel processing >Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark
【24h】

Strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark

机译:Stark-H:一种基于Spark的空间数据存储策略,以提高查询效率

获取原文

摘要

In this paper, we propose Strark-H, a storage and query strategy for large-scale spatial data based on Spark, to improve the response speed of spatial query by considering the spatial location and category keywords of spatial objects. Firstly, we define a custom InputFormat class to make spark natively understand the content of Shapefile, which is a common file format to store spatial data. Then, we put forward a partition and indexing method for spatial storage, based on which spatial data is partitioned unevenly according to the spatial position, which ensures the size of each partition does not exceed the block in HDFS and preserve the spatial proximity of spatial objects in the cluster. Moreover, a secondary index is generated, including global index based on spatial position for all partitions as well as local index based on category of spatial objects. Finally, we design a new data loading and query scheme based on Strark-H for spatial queries including range query, K-NN query and spatial join query. Extensive experiments on OSM show that Strark-H can be applied to Spark to natively support spatial query and storage with efficiency and scalability.
机译:在本文中,我们提出了基于Spark的大规模空间数据存储和查询策略Strark-H,以通过考虑空间对象的空间位置和类别关键字来提高空间查询的响应速度。首先,我们定义一个自定义的InputFormat类,以使spark原生地了解Shapefile的内容,Shapefile是存储空间数据的通用文件格式。然后,提出了一种空间存储的分区索引方法,在此基础上,根据空间位置对空间数据进行不均匀的分区,以确保每个分区的大小不超过HDFS中的块,并保留空间对象的空间接近性。在集群中。此外,将生成二级索引,包括基于所有分区的空间位置的全局索引以及基于空间对象类别的局部索引。最后,我们针对空间查询设计了一种基于Strark-H的新数据加载和查询方案,包括范围查询,K-NN查询和空间联接查询。在OSM上进行的大量实验表明,可以将Strark-H应用于Spark,以高效,可扩展性原生支持空间查询和存储。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号