Evaluation of Flash-Based Out-of-Core Stencil Computation Algorithms for SSD-Equipped Clusters

机译：配备SSD的群集的基于Flash的核外模板计算算法的评估

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper proposes a new scheme for solving data size requirements for a large-scale stencil computation, which are greater than the total size of the main memories of nodes in a cluster. It utilizes distributed flash SSDs over cluster nodes as an extension to the main memory with a locality-aware algorithm. Three algorithms with a different hierarchical blocking scheme for three memory tiers, namely, flash SSD, DRAM, and cache, are proposed, and they are evaluated in different platforms and flash devices. They utilize not only highly parallel asynchronous input/output in flash SSDs, but also appropriate blocking parameters by using an auto-tuning system named Blk-Tune. They also overcome the performance degradation caused by the non-uniform memory architecture (NUMA). The optimized algorithms for single nodes are extended for multi-nodes and evaluated in a cluster with traditional SATA SSDs, as well as with state-of-the-art flash devices, such as low-power and cost-effective M.2 NVMe flash SSDs. With the use of our scheme and distributed flash devices in a cluster, large-scale stencil problems can be solved with a limited number of nodes and a moderate size of main memories.

机译：本文提出了一种新的解决方案，用于解决大规模模板计算的数据大小要求，该要求大于集群中节点主存储器的总大小。它利用群集感知算法将群集节点上的分布式闪存SSD用作主内存的扩展。提出了针对三种存储层（闪存SSD，DRAM和缓存）具有不同层次阻塞方案的三种算法，并在不同的平台和闪存设备中对它们进行了评估。它们不仅利用闪存SSD中的高度并行的异步输入/输出，还利用名为Blk-Tune的自动调整系统利用适当的阻塞参数。它们还克服了非均匀内存体系结构（NUMA）导致的性能下降。针对单节点的优化算法可扩展到多节点，并在具有传统SATA SSD以及最新的闪存设备（例如低功耗且经济高效的M.2 NVMe闪存）的群集中进行评估。固态硬盘。通过在群集中使用我们的方案和分布式闪存设备，可以使用数量有限的节点和适度的主内存来解决大规模模板问题。

著录项

来源
《IEEE International Conference on Parallel and Distributed Systems》|2016年|1031-1040|共10页
会议地点
作者
Hiroko Midorikawa; Hideyuki Tan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Decision support systems; Random access memory; Tuning; Conferences; Flash memories; Layout;

机译：决策支持系统;随机存取存储器;调优;会议;闪存;布局;

相似文献

外文文献
中文文献
专利

1. A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU [J] . Jingcheng SHEN, Fumihiko INO, Albert FARRéS, IEICE transactions on information and systems . 2020,第12期

机译：基于数据为基于指令的基于指令，可以在GPU上加速核心外模板计算
2. PACC: a directive-based programming framework for out-of-core stencil computation on accelerators [J] . Nobuhiro Miki, Fumihiko Ino, Kenichi Hagihara International Journal of High Performance Computing and Networking . 2019,第1期

机译：PACC：基于指令的加速器上的核心模板计算的指令编程框架
3. Locally Recursive Non-Locally Asynchronous Algorithms for Stencil Computation [J] . V. D. Levchenko, A. Y. Perepelkina Lobachevskii journal of mathematics . 2018,第4期

机译：用于模板计算的本地递归非本地异步算法
4. Evaluation of Flash-Based Out-of-Core Stencil Computation Algorithms for SSD-Equipped Clusters [C] . Hiroko Midorikawa, Hideyuki Tan IEEE International Conference on Parallel and Distributed Systems . 2016

机译：SSD簇的基于闪蒸外模板计算算法的评估
5. High-performance cluster computing, algorithms, implementations and performance evaluation for computation-intensive applications to promote complex scientific research on turbulent flows. [D] . Wang, Hao. 2001

机译：面向计算密集型应用程序的高性能群集计算，算法，实现和性能评估，以促进对湍流的复杂科学研究。
6. Accuracy of compact-stencil interpolation algorithms for unstructured mesh finite volume solver [O] . Adek Tasri, Anita Susilawati 2021

机译：对非结构化网格有限音量求解器的紧凑型模板插值算法的精度
7. High-performance cluster computing, algorithms, implementations and performance evaluation for computation-intensive applications to promote complex scientific research on turbulent flows [O] . Wang, Hao 2001

机译：面向计算密集型应用程序的高性能集群计算，算法，实现和性能评估，以促进对湍流的复杂科学研究
8. Towards an Evaluation of Air Surveillance Track Clustering Algorithms via External Cluster Quality Measures. [R] . Lowry, M. C. 2013

机译：通过外部集群质量测量评估空中监视轨道聚类算法。

Evaluation of Flash-Based Out-of-Core Stencil Computation Algorithms for SSD-Equipped Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅