首页> 外文期刊>Concurrency and Computation >Parallel membership queries on very large scientific data sets using bitmap indexes
【24h】

Parallel membership queries on very large scientific data sets using bitmap indexes

机译:使用位图索引对大型科学数据集进行并行成员资格查询

获取原文
获取原文并翻译 | 示例

摘要

Many scientific applications produce very large amounts of data as advances in hardware fuelcomputing and experimental facilities. Managing and analyzing massive quantities of scientificdata is challenging as data are often stored in specific formatted files, such as HDF5 andNetCDF, which do not offer appropriate search capabilities. In this research, we investigated aspecial class of search capability, called membership query, to identify whether queried elementsof a set are members of an attribute. Attributes that naturally have classification values appearfrequently in scientific domains such as category and object type as well as in daily life such aszip code and occupation. Because classification attribute values are discrete and require randomdata access, performing amembership query on a large scientific data set creates challenges.Weapplied bitmap indexing and parallelization tomembership queries to overcome these challenges.Bitmap indexing provides high performance not only for low cardinality attributes but also forhigh cardinality attributes, such as floating-point variables, electric charge, or momentum in aparticle physics data set, due to compression algorithms such as Word-Aligned Hybrid. Weconducted experiments, in a highly parallelized environment, on data obtained from a particleaccelerator model and a synthetic data set.
机译:随着硬件燃料 r n计算和实验设备的进步,许多科学应用程序产生大量数据。由于数据通常存储在特定格式的文件(例如HDF5和 r nNetCDF)中,因此无法管理和分析大量的科学 r n数据,而这些文件无法提供适当的搜索功能。在这项研究中,我们调查了一种特殊的搜索功能,称为成员资格查询,以识别集合中的查询元素 r n是否是属性的成员。自然具有分类值的属性在科学领域(例如类别和对象类型)以及在日常生活中(例如 r n邮政编码和职业)经常出现。由于分类属性值是离散的并且需要随机访问数据,因此对大型科学数据集执行成员资格查询会带来挑战。我们将位图索引和并行化应用于成员资格查询可以克服这些挑战。 r n位图索引提供了很高的条件由于压缩算法(如Word-Aligned Hybrid), n n粒子物理数据集中的低基数属性和 n n高基数属性(例如浮点变量,电荷或动量)都具有出色的性能。我们在高度并行的环境中对从粒子加速器模型和合成数据集获得的数据进行了实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号