首页> 外文会议>Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on >Supporting User-Defined Subsetting and Aggregation over Parallel NetCDF Datasets
【24h】

Supporting User-Defined Subsetting and Aggregation over Parallel NetCDF Datasets

机译:在并行NetCDF数据集上支持用户定义的子集和聚合

获取原文
获取原文并翻译 | 示例

摘要

While dissemination of scientific data is becoming crucial for facilitating scientific discoveries, a key challenge being faced by these efforts is that the dataset sizes continue to grow rapidly. Coupled with the fact that wide area data transfer bandwidths and disk retrieval speeds are growing at a much slower pace, it is becoming extremely hard for scientists to download, manage, and process scientific datasets. We have developed a light-weight data management tool, which allows server-side sub setting and aggregation on scientific datasets stored in a native format. While our approach is more general, this paper describes an implementation specific to NetCDF, which is one of the most popular scientific data formats. To support a variety of queries efficiently, our tool generates code for pre-filtering and post-filtering, and parallelize selection and aggregation queries efficiently using novel algorithms. We have extensively evaluated our implementation and compared its performance and functionality against Open DAP. We demonstrate that even for sub setting queries that are directly supported in Open DAP, the sequential performance of our system is better. In addition, our system is capable of supporting a larger variety of queries, scaling performance by parallelizing the queries, and reducing wide area data transfers through server-side data aggregation.
机译:尽管传播科学数据对于促进科学发现变得至关重要,但这些努力面临的主要挑战是数据集的大小继续迅速增长。再加上广域数据传输带宽和磁盘检索速度正在以非常慢的速度增长这一事实,科学家下载,管理和处理科学数据集变得异常困难。我们开发了一种轻量级的数据管理工具,该工具允许在服务器端进行子设置并以本机格式存储的科学数据集上进行聚合。尽管我们的方法较为笼统,但本文介绍了特定于NetCDF的实现,NetCDF是最流行的科学数据格式之一。为了有效地支持各种查询,我们的工具会生成用于预过滤和后过滤的代码,并使用新颖的算法有效地并行化选择查询和聚合查询。我们已经广泛评估了我们的实现,并将其性能和功能与Open DAP进行了比较。我们证明,即使对于Open DAP直接支持的子设置查询,我们系统的顺序性能也更好。此外,我们的系统能够支持更多种类的查询,通过并行化查询来扩展性能,并通过服务器端数据聚合来减少广域数据传输。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号