首页> 外文学位 >Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms.

【24h】

Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms.

机译：多核平台的数据驱动编程抽象和优化。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-core platforms have spread to all corners of the computing industry, and trends in design and power indicate that the shift to multi-core will become even wider-spread in the future. As the number of cores on a chip rises, the complexity of memory systems and on-chip interconnects increases drastically. The programmer inherits this complexity in the form of new responsibilities for task decomposition, synchronization, and data movement within an application, which hitherto have been concealed by complex processing pipelines or deemed unimportant since tasks were largely executed sequentially. To some extent, the need for explicit parallel programming is inevitable, due to limits in the instruction-level parallelism that can be automatically extracted from a program. However, these challenges create a great opportunity for the development of new programming abstractions which hide the low-level architectural complexity while exposing intuitive high-level mechanisms for expressing parallelism.;Many models of parallel programming fall into the category of data-centric models, where the structure of an application depends on the role of data and communication in the relationships between tasks. The utilization of the inter-core communication networks and effective scaling to large data sets are decidedly important in designing efficient implementations of parallel applications. The questions of how many low-level architectural details should be exposed to the programmer, and how much parallelism in an application a programmer should expose to the compiler remain open-ended, with different answers depending on the architecture and the application in question. I propose that the key to unlocking the capabilities of multi-core platforms is the development of abstractions and optimizations which match the patterns of data movement in applications with the inter-core communication capabilities of the platforms.;After a comparative analysis that confirms and stresses the importance of finding a good match between the programming abstraction, the application, and the architecture, this dissertation proposes two techniques that showcase the power of leveraging data dependency patterns in parallel performance optimizations. Flexible Filters dynamically balance load in stream programs by creating flexibility in the runtime data flow through the addition of redundant stream filters. This technique combines a static mapping with dynamic flow control to achieve light-weight, distributed and scalable throughput optimization. The properties of stream communication, i.e., FIFO pipes, enable flexible filters by exposing the backpressure dependencies between tasks. Next, I present Huckleberry, a novel recursive programming abstraction developed in order to allow programmers to expose data locality in divide-and-conquer algorithms at a high level of abstraction. Huckleberry automatically converts sequential recursive functions with explicit data partitioning into parallel implementations that can be ported across changes in the underlying architecture including the number of cores and the amount of on-chip memory. I then present a performance model for multicore applications which provides an efficient means to evaluate the trade-offs between the computational and communication requirements of applications together with the hardware resources of a target multi-core architecture. The model encompasses all data-driven abstractions that can be reduced to a task graph representation and is extensible to performance techniques such as Flexible Filters that alter an application's original task graph. Flexible Filters and Huckleberry address the challenges of parallel programming on multi-core architectures by taking advantage of properties specific to the stream and recursive paradigms, and the performance model creates a unifying framework based on the communication between tasks in parallel applications. Combined, these contributions demonstrate that specialization with respect to communication patterns enhances the ability of parallel programming abstractions and optimizations to harvest the power of multi-core platforms.

机译：多核平台已遍及计算机行业的各个角落，设计和功能的趋势表明，向多核的转变在未来将变得更加广泛。随着芯片上内核数量的增加，存储系统和片上互连的复杂性急剧增加。程序员以新职责的形式继承了这种复杂性，这些职责包括应用程序中的任务分解，同步和数据移动，迄今为止，这些任务已被复杂的处理管道隐藏或由于任务按顺序执行而被认为不重要。在某种程度上，由于可以从程序中自动提取的指令级并行性的限制，不可避免地需要显式并行编程。但是，这些挑战为开发新的程序设计抽象提供了巨大的机会，这些程序设计隐藏了底层的体系结构复杂性，同时暴露了用于表达并行性的直观的高级机制。许多并行编程模型都属于以数据为中心的模型，应用程序的结构取决于任务之间的关系中数据和通信的作用。在设计并行应用程序的有效实现中，核心间通信网络的利用和对大型数据集的有效扩展至关重要。应向程序员公开多少个低级体系结构细节，以及程序员应向编译器公开多少应用程序并行性的问题仍然是开放式的，根据所讨论的体系结构和应用程序，答案各不相同。我认为，解锁多核平台功能的关键是开发抽象和优化方法，以使应用程序中的数据移动模式与平台的核心间通信功能相匹配。为了找到在编程抽象，应用程序和体系结构之间找到良好匹配的重要性，本文提出了两种技术，这些技术展示了在并行性能优化中利用数据依赖模式的强大功能。灵活的过滤器通过添加冗余流过滤器在运行时数据流中创造灵活性，从而动态平衡流程序中的负载。该技术将静态映射与动态流控制相结合，以实现轻量级，分布式和可伸缩的吞吐量优化。流通信的属性（即FIFO管道）通过暴露任务之间的背压依赖性来启用灵活的过滤器。接下来，我介绍Huckleberry，这是一种新颖的递归编程抽象，旨在使程序员能够以较高的抽象层次公开分治算法中的数据局部性。 Huckleberry将具有显式数据分区的顺序递归函数自动转换为并行实现，可以跨基础架构的变化移植这些实现，包括内核数量和片上内存量。然后，我提出了一种用于多核应用程序的性能模型，该模型提供了一种有效的方法来评估应用程序的计算和通信需求以及目标多核体系结构的硬件资源之间的权衡。该模型包含所有数据驱动的抽象，这些抽象可以简化为任务图表示形式，并且可以扩展为性能技术（例如更改应用程序原始任务图的灵活过滤器）。灵活的过滤器和Huckleberry通过利用特定于流和递归范式的属性来解决多核体系结构上并行编程的挑战，并且性能模型基于并行应用程序中任务之间的通信来创建统一框架。综合起来，这些贡献证明，关于通信模式的专业化增强了并行编程抽象和优化的能力，以获取多核平台的功能。

著录项

作者
Collins, Rebecca L.;
展开▼
作者单位

Columbia University.;

展开▼
授予单位 Columbia University.;
学科 Computer Science.
学位 Ph.D.
年度 2011
页码 176 p.
总页数 176
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A learning Portfolio solver for optimizing the performance of constraint programming problems on multi-core computing systems [J] . Tarek Menouer, Nitin Sukhija, Bertrand Le Cun Concurrency, practice and experience . 2017,第4期

机译：一个学习投资组合求解器，用于优化多核计算系统上约束编程问题的性能
2. A learning Portfolio solver for optimizing the performance of constraint programming problems on multi-core computing systems [J] . Menouer Tarek, Sukhija Nitin, Le Cun Bertrand Theoretical and Experimental Plant Physiology . 2017,第4期

机译：一种学习组合求解器，用于优化多核计算系统上约束规划问题的性能
3. Optimizing image processing on multi-core CPUs with Intel parallel programming technologies [J] . Charles Morgan Computing reviews . 2015,第9期

机译：使用Intel并行编程技术优化多核CPU上的图像处理
4. Abstraction of Programming Models Across Multi-Core and GPGPU Architectures [C] . Thomas H. BEACH, Ian J. GRIMSTEAD, David W. WALKER, International Parallel Computing Conference . 2010

机译：多核和GPGPU架构中编程模型的抽象
5. Real-time scheduling of embedded applications on multi-core platforms. [D] . Fan, Ming. 2014

机译：在多核平台上实时调度嵌入式应用程序。
6. T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors [O] . Youngmin Kim, Ki-Seong Lee, Ngoc-Son Pham, 2016

机译：基于T-L平面抽象的多核无线传感器节能实时调度
7. Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms [O] . Collins Rebecca L. 2011

机译：多核平台的数据驱动编程抽象和优化
8. Domain Expert-Directed Program Optimizations for Accelerated Performance on Heterogeneous Multi-core Processors. [R] . Yew, P., Yang, B. 2013

机译：针对异构多核处理器的加速性能的域专家指导程序优化。

Data-Driven Programming Abstractions and Optimization for Multi-Core Platforms.

摘要

著录项

相似文献

相关主题

期刊订阅