首页> 外文学位 >Specification, configuration and execution of data-intensive scientific applications.
【24h】

Specification, configuration and execution of data-intensive scientific applications.

机译:规范,配置和执行数据密集型科学应用程序。

获取原文
获取原文并翻译 | 示例

摘要

Recent advances in digital sensor technology and numerical simulations of real-world phenomena are resulting in the acquisition of unprecedented amounts of raw digital data. Terms like 'data explosion' and 'data tsunami' have come to describe the uncontrolled rate at which scientific datasets are generated by automated sources ranging from digital microscopes and telescopes to in-silico models simulating the complex dynamics of physical and biological processes. Scientists in various domains now have secure, affordable access to petabyte-scale observational data gathered over time, the analysis of which, is crucial to scientific discoveries and furthering of knowledge within the domain. The availability of commodity components have fostered the development of large distributed systems with high-performance computing resources to support the execution requirements of scientific data analysis applications. Increased levels of middleware support over the years have aimed to provide high scalability of application execution on these systems. However, the high-resolution, multi-dimensional nature of scientific datasets, and the complexity of analysis requirements present challenges to efficient application execution on such systems. Traditional brute-force analysis techniques to extract useful information from scientific datasets may no longer meet desired performance levels at extreme data scales.;This thesis builds on a comprehensive study involving multi-dimensional data analysis applications at large data scales, and identifies a set of advanced factors or parameters to this class of applications which can be exploited in domain-specific ways to obtain substantial improvements in performance. Factors like the on-disk layout of datasets and the mechanisms for accessing them, and the mapping of analysis processes to computational resources can be customized for performance based on our knowledge of an application's computational and I/O properties. A useful property of these applications is their ability to operate at multiple performance levels based on a set of trade-off parameters, while providing different levels of quality-of-service (QoS) specific to the application instance. To avail the performance benefits brought about by such factors, applications must be configured for execution in specific ways for specific systems. Middleware support for such domain-specific configuration is limited, and there is typically no integration across middleware layers to this end. Low-level manual configuration of applications within a large space of solutions is error-prone and tedious.;This thesis proposes an approach for the development and execution of large scientific multi-dimensional data analysis applications that takes multiple performance parameters into account and supports the notion of domain-specific configuration-as-a-service. My research identifies various aspects that go into the creation of a framework for user-guided, system-directed performance optimizations for such applications. The framework seeks to achieve this goal by integrating software modules that (i) provide a unified, homogeneous model for the high-level specification of any conceptual knowledge that may be used to configure applications within a domain, (ii) perform application configuration in response to user directives, i.e., use the specifications to translate high-level requirements into low-level execution plans optimized for a given system, and (iii) carry out the execution plans on the underlying system in an efficient and scalable manner. A prototype implementation of the framework that integrates several middleware layers is used for evaluating our approach. Experimental results gathered for real-world application scenarios from the domains of astronomy and biomedical imaging demonstrate the utility of our framework towards meeting the scientific performance requirements at very large data scales.
机译:数字传感器技术的最新进展以及对现实世界现象的数值模拟导致了前所未有数量的原始数字数据的获取。诸如“数据爆炸”和“数据海啸”之类的术语已经描述了由自动化来源(从数字显微镜和望远镜到模拟物理和生物过程的复杂动力学的计算机模拟模型)生成科学数据集的不受控制的速率。现在,各个领域的科学家都可以安全,负担得起地访问随着时间推移而收集的PB级观测数据,对这些数据的分析对于科学发现和促进该领域内的知识至关重要。商品组件的可用性促进了具有高性能计算资源的大型分布式系统的开发,以支持科学数据分析应用程序的执行要求。多年来,不断增加的中间件支持水平旨在提供这些系统上应用程序执行的高可伸缩性。但是,科学数据集的高分辨率,多维性质以及分析要求的复杂性给此类系统上有效的应用程序执行带来了挑战。从科学数据集提取有用信息的传统蛮力分析技术可能不再能在极端数据规模上达到理想的性能水平。本文基于对涉及大数据规模的多维数据分析应用程序的全面研究,并确定了一组此类应用程序的高级因素或参数,可以采用特定领域的方式加以利用,以获得性能上的显着提高。可以根据我们对应用程序的计算和I / O属性的了解,自定义性能,以限制数据集在磁盘上的布局及其访问机制以及分析过程到计算资源的映射等因素。这些应用程序的一个有用特性是,它们能够基于一组权衡参数在多个性能级别上运行,同时为应用程序实例提供不同级别的服务质量(QoS)。为了利用这些因素带来的性能优势,必须将应用程序配置为以特定方式针对特定系统执行。中间件对此类特定于域的配置的支持是有限的,并且为此目的通常没有跨中间件层的集成。在大量解决方案中对应用程序进行低级别的手动配置是容易出错且乏味的。本论文提出了一种开发和执行大型科学多维数据分析应用程序的方法,该方法考虑了多个性能参数并支持特定于域的配置即服务的概念。我的研究确定了创建此类应用程序的用户指导的,系统指导的性能优化框架的各个方面。该框架旨在通过集成软件模块来实现此目标,这些软件模块(i)为可用于配置域内应用程序的任何概念性知识的高级规范提供统一的同类模型,(ii)作为响应执行应用程序配置用户指令,即使用规范将高级需求转换为针对给定系统优化的低级执行计划,以及(iii)以有效且可扩展的方式在基础系统上执行执行计划。集成了多个中间件层的框架的原型实现用于评估我们的方法。从天文学和生物医学成像领域为实际应用场景收集的实验结果表明,我们的框架可用于满足超大规模数据规模的科学性能要求。

著录项

  • 作者

    Kumar, Vijay S.;

  • 作者单位

    The Ohio State University.;

  • 授予单位 The Ohio State University.;
  • 学科 Engineering Computer.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 287 p.
  • 总页数 287
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号