首页> 外文会议>IEEE International Congress on Big Data >Parallel POD Compression of Time-Varying Big Datasets Using m-Swap on the K Computer
【24h】

Parallel POD Compression of Time-Varying Big Datasets Using m-Swap on the K Computer

机译:K计算机上使用m-Swap的时变大数据集的并行POD压缩

获取原文

摘要

Thanks to the supercomputer, more and more complicated simulations are successfully achieved. On the other hand, to analyze and understand the intrinsic properties of the big datasets from the simulations is an urgent research for scientists. However, the explosive size of the big datasets makes such kind of task difficult. Therefore, reduction of the size of the big datasets becomes an important topic, in which data compression and parallel computing are the two key techniques. In this paper, we presented a parallel data compression approach to reduce the size of time-varying big datasets. Firstly, we employ the proper orthogonal decomposition (POD) method for compression. The POD method can extract the underlying features of datasets to greatly reduce the size of big datasets. Meanwhile, the compressed datasets can be decompressed linearly. This feature can help scientists to interactively visualize big datasets for analysis. Then, we introduced a novel m-swap method to effectively parallelize the POD compression algorithm. The m-swap method can reach a high performance through fully using all parallel computing processors. In another word, no idle processors exist in the parallel compression process. Furthermore, the m-swap method can greatly reduce the cost of interprocessor communication. This is achieved by controlling the data transfer among 2m processors to obtain the best balance of computation cost of these processors. Finally, the effectiveness of our method will be demonstrated through compressing several time-varying big datasets on the K computer with ten thousands of processors.
机译:由于超级计算机,成功实现了越来越复杂的模拟。另一方面,分析和理解模拟中大数据集的内在特性是对科学家的紧急研究。然而,大数据集的爆炸大小使得这种任务困难。因此,减少大数据集的大小成为一个重要主题,其中数据压缩和并行计算是两种关键技术。在本文中,我们介绍了一种并行数据压缩方法,以减小时变大数据集的大小。首先,我们采用适当的正交分解(POD)用于压缩方法。 POD方法可以提取数据集的底层特征,从而大大减小大数据集的大小。同时,压缩的数据集可以线性地压缩。此功能可以帮助科学家互动地可视化大数据集进行分析。然后,我们介绍了一种新的M-Swap方法,以有效地并行化POD压缩算法。通过全部并行计算处理器,M-Swap方法可以通过完全达到高性能。在另一个单词中,并行压缩过程中没有存在空闲处理器。此外,M-SWAP方法可以大大降低迭代源通信的成本。这是通过控制2M处理器之间的数据传输来实现这些处理器的计算成本的最佳平衡来实现。最后,我们的方法的有效性将通过压缩k计算机上的几个时变大数据集来证明,其中万辆有一万个处理器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号