Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs

机译：移动CPU / GPU中用于并行计算的平衡块设计架构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

To increase performance, processor manufacturers extract parallelism through shrinking transistors and adding more of them to single-core chips and create multi-core systems. Although microprocessors performance continues to grow at an exponential rate, this approach generates too much heat and consumes too much power. These architectures not only introduce several complications but require tremendous efforts for organization of special software for parallel processing. In many cases, these difficulties are insurmountable. The programmers have to write complex code to prioritize the tasks or perform the task in parallel like extracting parallelism through threads in GPUs. One of the key issues for the programmers is how to divide the tasks in to sub-tasks. A faulty calculation may lead to increased data dependency which will slow the processor. Processor that performs more parallel operations can simultaneously increase the queuing delays. In both of the scenarios mentioned above, the relative cost of communication (also known as data transportation energy) between processing elements in microprocessor (or objects in parallel programming) is increasing relative to that of computation. This trend is resulting in larger caches for every new processor generation and more complex and costly latency tolerant mechanisms. Here we introduce a combinatorial architecture that has a unique property-multi-core running on a sequential code. This architecture can be used for both CPUs and GPUs. Some minor adjustments to a regular compiler are needed for loading. Especially, current mobile GPUs technologies are still relatively immature and require substantial improvements to enable wireless devices to perform the complex graphics-related functions. Our new architecture is more suitable for mobile GPUs/CPUs, i.e., mobile heterogeneous computing, with limited resources and relative greater performance.

机译：为了提高性能，处理器制造商通过缩小晶体管并将更多的晶体管添加到单核芯片中并创建多核系统来提取并行性。尽管微处理器的性能继续以指数级的速度增长，但是这种方法会产生过多的热量并消耗过多的功率。这些架构不仅带来一些复杂性，而且还需要付出巨大的努力来组织用于并行处理的特殊软件。在许多情况下，这些困难是无法克服的。程序员必须编写复杂的代码才能确定任务的优先级或并行执行任务，例如通过GPU中的线程提取并行度。程序员的关键问题之一是如何将任务划分为子任务。错误的计算可能会导致数据依赖性增加，从而降低处理器的速度。执行更多并行操作的处理器可能同时增加排队延迟。在上述两种情况下，微处理器（或并行编程中的对象）中的处理元件之间的通信相对成本（也称为数据传输能量）相对于计算成本而言正在增加。这种趋势导致为每个新一代处理器提供更大的缓存，以及更复杂，更昂贵的延迟容忍机制。在这里，我们介绍一种组合体系结构，该体系结构具有在顺序代码上运行的唯一属性-多核。此体系结构可用于CPU和GPU。加载时需要对常规编译器进行一些细微调整。特别是，当前的移动GPU技术仍相对不成熟，需要进行实质性改进以使无线设备能够执行与图形相关的复杂功能。我们的新架构更适合于具有有限资源和相对更高性能的移动GPU / CPU，即移动异构计算。

著录项

来源
《International Conference on Computing for Geospatial Research and Application》|2013年|140-141|共2页
会议地点
作者
Mani Ganapathy; Berkovich Simon; Liao Duoduo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
combinatorial architecture; fault-tolerance; mobile gpu; parallel computing;

机译：组合架构;容错;移动GPU;并行计算;

相似文献

外文文献
中文文献
专利

1. Enumerating Joint Weight of a Binary Linear Code Using Parallel Architectures: multi-core CPUs and GPUs [J] . Shohei Ando, Fumihiko Ino, Toru Fujiwara, International Journal of Networking and Computing . 2015,第2期

机译：使用并行架构枚举二进制线性代码的联合权重：多核CPU和GPU
2. Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters [J] . Kaipeng Li, Amanullah Ghazi, Chance Tarver, Journal of signal processing systems for signal, image, and video technology . 2017,第3期

机译：用于移动变送器的移动GPU和嵌入式多核CPU上的并行数字预失真设计
3. A parallel solving method for block-tridiagonal equations on CPU-GPU heterogeneous computing systems [J] . Yang Wangdong, Li Kenli, Li Keqin Journal of supercomputing . 2017,第5期

机译：CPU-GPU异构计算系统中块三对角方程的并行求解方法
4. Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs [C] . Mani Ganapathy, Berkovich Simon, Liao Duoduo International Conference on Computing for Geospatial Research and Application . 2013

机译：平衡块设计架构，用于移动CPU / GPU中的并行计算
5. Architecture–Performance Interrelationship Analysis in Single/Multiple CPU/GPU Computing Systems: Application to Composite Process Flow Modeling [D] . Haney, Richard Harrison 2013

机译：单/多CPU / GPU计算系统中的架构—性能相互关系分析：在复合流程模型中的应用
6. Toward real-time diffuse optical tomography: accelerating light propagation modeling employing parallel computing on GPU and CPU [O] . Matthaios Doulgerakis, Adam T. Eggebrecht, Stanislaw Wojtkiewicz, 2017

机译：迈向实时漫射光学层析成像：使用GPU和CPU上的并行计算来加速光传播建模
7. Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters [O] . Li, Kaipeng, Ghazi, Amanullah, Tarver, Chance, 2016

机译：移动GpU与嵌入式Linux的并行数字预失真设计用于移动发射机的多核CpU

Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs

摘要

著录项

相似文献

相关主题

期刊订阅