首页> 外文期刊>Future generation computer systems >Towards efficient data search and subsetting of large-scale atmospheric datasets
【24h】

Towards efficient data search and subsetting of large-scale atmospheric datasets

机译:寻求有效的数据搜索和大型大气数据集的子集

获取原文
获取原文并翻译 | 示例
           

摘要

Discovering the correct dataset in an efficient fashion is critical for effective simulations in the atmospheric sciences. Unlike text-based web documents, many of the large scientific datasets often contain binary encoded data that is hard to discover using popular search engines. In the atmospheric sciences, there has been a significant growth in public data hosting services. However, the ability to index and search has been limited by the metadata provided by the data host. We have developed an infrastructure - Atmospheric Data Discovery System (ADDS) - that provides an efficient data discovery environment for observational datasets in the atmospheric sciences. To support complex querying capabilities, we automatically extract and index fine-grained metadata. Datasets are indexed based on periodic crawling of popular sites and also of files requested by the users. Users are allowed to access subsets of a large dataset through our data customization feature. Our focus is the overall architecture, data subsettins scheme, and a performance evaluation of our system.
机译:以有效的方式发现正确的数据集对于有效进行大气科学模拟至关重要。与基于文本的Web文档不同,许多大型科学数据集通常包含二进制编码的数据,而这些数据很难用流行的搜索引擎发现。在大气科学领域,公共数据托管服务有了显着增长。但是,索引和搜索的能力已受到数据主机提供的元数据的限制。我们已经开发了基础设施-大气数据发现系统(ADDS)-可为大气科学中的观测数据集提供有效的数据发现环境。为了支持复杂的查询功能,我们会自动提取和索引细粒度的元数据。数据集的索引基于流行站点的定期爬网以及用户请求的文件的爬网。允许用户通过我们的数据定制功能访问大型数据集的子集。我们的重点是整体体系结构,数据子集方案以及对系统的性能评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号