首页> 外文会议>International Conference on High Performance Computing >Scalable Proximity-Based Methods for Large-Scale Analysis of Atom Probe Data
【24h】

Scalable Proximity-Based Methods for Large-Scale Analysis of Atom Probe Data

机译:基于可扩展的基于邻近度的原子探针数据大规模分析方法

获取原文

摘要

Powered by recent advances in data acquisition technologies, today's state-of-the-art atom probe microscopes yield data sets with sizes ranging from a few million atoms to billions of atoms. Analysis of these atomic data sets within rea-sonable turnaround times is a pressing data analysis challenge for material scientists currently equipped with software systems that do not scale to these massive data sets. Here, we present the shared memory component of a larger ongoing effort to develop a multi-feature data analysis framework capable of analyzing atom probe data of all sizes and scales from desktop multicore machines to large-scale high-performance computing platforms with hybrid (shared and distributed memory) architectures. Our focus here is on a broad class of popular atom probe data analysis methods that rely on core time-consuming k-NN queries. We present a scalable, heuristic algorithm for k-NN queries using three-dimensional range trees. To demonstrate its efficacy, the k-NN algorithm is integrated with two use cases of atom probe data analysis methods and the resulting analysis times are shown to speedup by over 20X on a 32-core Cray XC40 node using workloads up to 8 million atoms, which is already beyond the at-scale capabilities of existing atom probe software. Using this k-NN algorithm, we also introduce a novel parameter estimation method for a class of cluster finding methods, called friends-of-friends (FoF) methods, to completely bypass their expensive pre-processing steps. In each case, we validate the results on a variety of control data sets.
机译:在数据采集技术的最新进展的推动下,当今最先进的原子探针显微镜可生成大小范围从几百万个原子到数十亿个原子的数据集。对于目前配备了无法扩展到这些海量数据集的软件系统的材料科学家来说,在合理的周转时间内对这些原子数据集进行分析是一项紧迫的数据分析挑战。在这里,我们将介绍共享内存组件,这是正在进行的一项较大的工作,旨在开发一种多功能数据分析框架,该框架能够分析从台式机多核计算机到具有混合动力的大型高性能计算平台(共享)的各种规模和规模的原子探针数据。和分布式内存)架构。我们的重点是广泛的一类流行的原子探针数据分析方法,这些方法依赖于耗时的核心k-NN查询。我们为使用三维范围树的k-NN查询提供了一种可扩展的启发式算法。为了证明其有效性,将k-NN算法与两个原子探针数据分析方法用例集成在一起,结果表明,在工作负载高达800万个原子的32核Cray XC40节点上,分析时间加快了20倍以上,这已经超出了现有原子探针软件的大规模功能。使用这种k-NN算法,我们还为一类称为“朋友之友(FoF)”的聚类查找方法引入了一种新颖的参数估计方法,以完全绕开它们昂贵的预处理步骤。在每种情况下,我们都会在各种控制数据集上验证结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号