首页> 外文会议>International Conference on Computing for Geospatial Research and Application >Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs
【24h】

Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs

机译:移动CPU / GPU中用于并行计算的平衡块设计架构

获取原文

摘要

To increase performance, processor manufacturers extract parallelism through shrinking transistors and adding more of them to single-core chips and create multi-core systems. Although microprocessors performance continues to grow at an exponential rate, this approach generates too much heat and consumes too much power. These architectures not only introduce several complications but require tremendous efforts for organization of special software for parallel processing. In many cases, these difficulties are insurmountable. The programmers have to write complex code to prioritize the tasks or perform the task in parallel like extracting parallelism through threads in GPUs. One of the key issues for the programmers is how to divide the tasks in to sub-tasks. A faulty calculation may lead to increased data dependency which will slow the processor. Processor that performs more parallel operations can simultaneously increase the queuing delays. In both of the scenarios mentioned above, the relative cost of communication (also known as data transportation energy) between processing elements in microprocessor (or objects in parallel programming) is increasing relative to that of computation. This trend is resulting in larger caches for every new processor generation and more complex and costly latency tolerant mechanisms. Here we introduce a combinatorial architecture that has a unique property-multi-core running on a sequential code. This architecture can be used for both CPUs and GPUs. Some minor adjustments to a regular compiler are needed for loading. Especially, current mobile GPUs technologies are still relatively immature and require substantial improvements to enable wireless devices to perform the complex graphics-related functions. Our new architecture is more suitable for mobile GPUs/CPUs, i.e., mobile heterogeneous computing, with limited resources and relative greater performance.
机译:为了提高性能,处理器制造商通过缩小晶体管并将更多的晶体管添加到单核芯片中并创建多核系统来提取并行性。尽管微处理器的性能继续以指数级的速度增长,但是这种方法会产生过多的热量并消耗过多的功率。这些架构不仅带来一些复杂性,而且还需要付出巨大的努力来组织用于并行处理的特殊软件。在许多情况下,这些困难是无法克服的。程序员必须编写复杂的代码才能确定任务的优先级或并行执行任务,例如通过GPU中的线程提取并行度。程序员的关键问题之一是如何将任务划分为子任务。错误的计算可能会导致数据依赖性增加,从而降低处理器的速度。执行更多并行操作的处理器可能同时增加排队延迟。在上述两种情况下,微处理器(或并行编程中的对象)中的处理元件之间的通信相对成本(也称为数据传输能量)相对于计算成本而言正在增加。这种趋势导致为每个新一代处理器提供更大的缓存,以及更复杂,更昂贵的延迟容忍机制。在这里,我们介绍一种组合体系结构,该体系结构具有在顺序代码上运行的唯一属性-多核。此体系结构可用于CPU和GPU。加载时需要对常规编译器进行一些细微调整。特别是,当前的移动GPU技术仍相对不成熟,需要进行实质性改进以使无线设备能够执行与图形相关的复杂功能。我们的新架构更适合于具有有限资源和相对更高性能的移动GPU / CPU,即移动异构计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号