首页> 外文会议>IEEE International Conference on Data Mining Workshops >Parallel k-Means Clustering of Geospatial Data Sets Using Manycore CPU Architectures
【24h】

Parallel k-Means Clustering of Geospatial Data Sets Using Manycore CPU Architectures

机译:使用Manycore CPU架构的并行k均值聚类地理空间数据集

获取原文

摘要

The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of weather, climate, ecological, and other geoscientific data sets fused from disparate sources. Many of the standard tools used on individual workstations are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of parallelism available in state-of-the-art high-performance computing platforms can enable such analysis. Here, we describe pKluster, an open-source tool we have developed for accelerated k-means clustering of geospatial and geospatiotemporal data, and discuss algorithmic modifications and code optimizations we have made to enable it to effectively use parallel machines based on novel CPU architectures-such as the Intel Knights Landing Xeon Phi and Skylake Xeon processors-with many cores and hardware threads, and employing significant single instruction, multiple data (SIMD) parallelism. We outline some applications of the code in ecology and climate science contexts and present a detailed discussion of the performance of the code for one such application, LiDAR-derived vertical vegetation structure classification.
机译:来自天文台网络,遥感平台和计算机地球系统模型等来源的高分辨率地理时空数据集的可用性不断提高,为知识发现和天气,气候,生态学以及其他地球科学数据集的挖掘提供了新的可能性,这些数据集是由不同的资料来源。单个工作站上使用的许多标准工具对于这种规模的数据集的分析和综合都是不切实际的。但是,可以有效利用复杂的内存层次结构和最新的高性能计算平台中可用的极高并行度的新算法方法可以进行此类分析。在这里,我们介绍了pKluster,这是我们为加速地理空间和地理时空数据的k均值聚类而开发的开源工具,并讨论了我们进行的算法修改和代码优化,以使其能够有效地使用基于新型CPU架构的并行机-例如具有多个内核和硬件线程的Intel Knights Landing Xeon Phi和Skylake Xeon处理器,并且采用了重要的单指令多数据(SIMD)并行性。我们概述了该规范在生态学和气候科学领域中的一些应用,并针对该应用(LiDAR衍生的垂直植被结构分类)的性能进行了详细的讨论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号