首页> 外文会议> >Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications
【24h】

Design and analysis of a multi-dimensional data sampling service for large scale data analysis applications

机译:面向大型数据分析应用程序的多维数据采样服务的设计和分析

获取原文

摘要

Sampling is a widely used technique to increase efficiency in database and data mining applications operating on large dataset. In this paper, we present a scalable sampling implementation that supports efficient, multi-dimensional spatio-temporal sample generation on dynamic, large scale datasets stored on a storage cluster The proposed algorithm leverages Hilbert space-filling curves in order to provide an approximate linear order of multidimensional data while maintaining spatial locality. This new implementation is then bootstrapped on top of our previous implementation, which efficiently samples large datasets along a single dimension (e.g., time), thereby realizing a service for spatio-temporal sampling. We evaluate the performance of our approach comparing it to the popular R-tree based technique. The experimental results show that our approach achieves up to an order of magnitude higher efficiency and scalability.
机译:采样是一种广泛使用的技术,可以提高在大型数据集上运行的数据库和数据挖掘应用程序的效率。在本文中,我们提出了一种可扩展的采样实现,该采样支持在存储集群上存储的动态,大规模数据集上高效,多维的时空样本生成。所提出的算法利用希尔伯特空间填充曲线以提供近似的线性阶数多维数据,同时保持空间局部性。然后,将在我们之前的实现之上启动该新实现,该实现将有效地沿单个维度(例如时间)对大型数据集进行采样,从而实现用于时空采样的服务。我们将其方法与基于R树的流行技术进行比较,评估其性能。实验结果表明,我们的方法可将效率和可伸缩性提高多达一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号