首页> 外文期刊>Distributed and Parallel Databases >Exploring spatial datasets with histograms
【24h】

Exploring spatial datasets with histograms

机译:使用直方图探索空间数据集

获取原文
获取原文并翻译 | 示例

摘要

As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics.
机译:随着在线空间数据集的数量和复杂度的增长,用户决定数据集是否适合其任务变得越来越困难,尤其是当他们不了解数据集时。在本文中,我们建议使用浏览作为探索空间数据集内容的有效途径。通过浏览,用户可以在评估数据库查询之前查看结果集的大小,从而避免零命中/超级命中的查询并节省时间和资源。尽管支持浏览的基础技术类似于范围查询聚合和选择性估计,但是空间数据集浏览提出了一些独特的挑战。在本文中,我们确定了浏览应用程序需要支持的一组空间关系,即包含,包含和重叠关系。我们证明了在给定分辨率下准确回答有关包含关系的查询所需的存储空间的下限。然后,我们提出了三种存储有效的近似算法,我们认为它们是第一个估计有关这些空间关系的查询结果的算法。我们使用合成数据集和实际数据集评估了这些算法,并表明它们为具有各种特征的数据集提供了高度准确的估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号