首页> 外文期刊>Concurrency and Computation >Parallel membership queries on very large scientific data sets using bitmap indexes
【24h】

Parallel membership queries on very large scientific data sets using bitmap indexes

机译:使用位图索引非常大的科学数据集上并行成员查询

获取原文
获取原文并翻译 | 示例
           

摘要

Many scientific applications produce very large amounts of data as advances in hardware fuelcomputing and experimental facilities. Managing and analyzing massive quantities of scientificdata is challenging as data are often stored in specific formatted files, such as HDF5 andNetCDF, which do not offer appropriate search capabilities. In this research, we investigated aspecial class of search capability, called membership query, to identify whether queried elementsof a set are members of an attribute. Attributes that naturally have classification values appearfrequently in scientific domains such as category and object type as well as in daily life such aszip code and occupation. Because classification attribute values are discrete and require randomdata access, performing amembership query on a large scientific data set creates challenges.Weapplied bitmap indexing and parallelization tomembership queries to overcome these challenges.Bitmap indexing provides high performance not only for low cardinality attributes but also forhigh cardinality attributes, such as floating-point variables, electric charge, or momentum in aparticle physics data set, due to compression algorithms such as Word-Aligned Hybrid. Weconducted experiments, in a highly parallelized environment, on data obtained from a particleaccelerator model and a synthetic data set.
机译:许多科学应用程序会产生非常大量的数据作为硬件燃料的进步计算和实验设施。管理和分析大量的科学数据具有具有挑战性,因为数据通常存储在特定的格式化文件中,例如HDF5和NetCDF,不提供适当的搜索功能。在这项研究中,我们调查了一个特殊类搜索能力,称为隶属查询,以识别查询元素一组是属性的成员。出现自然具有分类值的属性经常在科学域等类别和物体类型以及日常生活中,如邮政编码和职业。因为分类属性值是离散的并且需要随机数据访问,对大型科学数据集执行Amembership查询创造了挑战。我们应用Bitmap索引和并行化Tomembership查询以克服这些挑战。位图索引不仅为低基数属性提供了高性能,还提供了高性能高基数属性,如浮点变量,电荷或动量粒子物理数据集,由于压缩算法,如词对齐的混合动力。我们在高度平行化的环境中进行实验,对从颗粒获得的数据加速器模型和合成数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号