首页> 外文学位 >Compilation, locality optimization, and managed distributed execution of scientific dataflows.
【24h】

Compilation, locality optimization, and managed distributed execution of scientific dataflows.

机译:科学数据流的编译,位置优化和可管理的分布式执行。

获取原文
获取原文并翻译 | 示例

摘要

Supercomputing and other high-performance computing technologies have succeeded in achieving high computational throughput in geoscience atmospheric, land, and ocean modeling, but have ignored problems in processing and analyzing the resultant model predictions at a similar scale. Reductive data analysis is severely limited by the financial and temporal costs of large scale data transfer. Scientific workflow frameworks enable scientists to leverage grid-scale resources, but are too complex for individual scientists to use, despite the availability of graphical tools.;In order to address the quickly-growing amount of data and the growing desire to share and use each other's data, this research has made three major contributions. One, shell compilation is introduced as a feasible method for optimizing, sandboxing, and porting shell scripts, which are programs of programs. Shell compilation allows scientists to reuse their existing analysis scripts and exploit parallel and distributed computing technology with minimal, if any, porting effort. The application of standard compilation techniques at this higher-level is described, noting the new semantic differences and potential benefits (automatic program-level parallelism) that arise. Two, the ability to compile scripts is applied in geoscience to automatically convert scripts to scientific workflows, resulting in the ability to transparently distribute computation to remote data servers and reduce or eliminate unnecessary data download. The resulting system, the Script Workflow Analysis for MultiProcessing (SWAMP) system dynamically schedules and executes workflows, dispatching commands among cluster machines paying particular attention to data locality and minimizing internal data transfer---a feature particularly important for data-intense workloads. Performance is shown effective in real geoscience data reduction analysis scripts. Third, the characteristics of I/O-constrained workloads are analyzed and described, along with a technique for explicitly caching files in-memory and a new partitioning algorithm, Independent Set Partitioning (InSeP), whose simple high-level approach based on set operations can be applied on dynamically-scheduled workflows.
机译:超级计算和其他高性能计算技术已成功地在地球科学的大气,陆地和海洋建模中实现了高计算吞吐量,但忽略了在类似规模的处理和分析结果模型预测时遇到的问题。还原数据分析受到大规模数据传输的财务和时间成本的严重限制。科学的工作流程框架使科学家能够利用网格规模的资源,但是尽管有图形工具,但对于单个科学家而言太过复杂了;为了解决快速增长的数据量以及共享和使用每种数据的日益增长的需求他人的数据,这项研究做出了三大贡献。首先,介绍了shell编译作为一种优化,沙盒和移植shell脚本(一种程序程序)的可行方法。 Shell编译使科学家可以重用其现有的分析脚本,并以最小的移植工作来利用并行和分布式计算技术。描述了标准编译技术在此更高级别上的应用,并指出了新的语义差异和潜在的好处(自动程序级并行性)。第二,在地球科学中应用了编译脚本的功能,可以将脚本自动转换为科学的工作流程,从而可以透明地将计算分发到远程数据服务器,并减少或消除不必要的数据下载。产生的系统,用于多进程的脚本工作流分析(SWAMP)系统动态地调度和执行工作流,在群集计算机之间分配命令,并特别注意数据的局部性并最大程度地减少内部数据传输-这对数据密集型工作负载而言尤其重要。在真实的地球科学数据缩减分析脚本中显示出有效的性能。第三,分析和描述了受I / O约束的工作负载的特征,以及用于在内存中显式缓存文件的技术和新的分区算法独立集分区(InSeP),该算法基于集操作的简单高级方法可以应用于动态计划的工作流程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号