首页> 外文会议>International Conference on Parallel and Distributed Processing Techniques and Applications >On the Scalability of Parallel Quicksort: A Case Study on Distributed vs. Shared-Memory Models
【24h】

On the Scalability of Parallel Quicksort: A Case Study on Distributed vs. Shared-Memory Models

机译:关于并行QuickSort的可扩展性:分布式与共享内存模型的案例研究

获取原文

摘要

The increasing availability of parallel systems affords undergraduates opportunities to experience firsthand the potential benefit and pitfalls of parallel programming. However, a clear understanding of the underlying architecture is critical to achieve the anticipated speed up of serial programs. To this end, analyzing performance metrics is essential for tuning parallel implementations and determining what improvement can be expected by scaling out or increasing the number of processes [1-4]. Without considering important aspects such locality, load balancing and communication overheads associated with each available parallel system, students are having a hard time obtaining the intended performance and determining which approach is suitable to solving the problem at hand. Finding the best mapping between the parallel implementation and the parallel platform available is hard to achieve without clear guidance or practical experiences. Given some legacy code, a first approach consists of looking at parallel programming patterns such as divide-and-conquer or decomposition techniques. This pattern has been used widely to design a number of efficient algorithms for a variety of applications [5]. Students are already familiar with the approach of recursively partitioning a problem into smaller sub-problems. Identification of sub-problems which can be solved independently is an intuitive choice for achieving speed up. Even if this is not always the case, this approach allows students to quickly implement embarrassingly parallel programs using message passing and shared-address space programming paradigms. Ultimately, performance metrics are collected and contrasted with the serial and parallel runtime implementations. Sources of overhead affecting the performances of the algorithms are examined and discussed. This practical approach can be repeated on various classical problems (matrix multiplication, sorting, numerical integration and the gravitational N-body) and extended to new problems by applying various parallel programming patterns
机译:并行系统的越来越多的可用性提供了大学生的机会,以便第一手体验并行编程的潜在利益和陷阱。但是,清楚地了解潜在的架构至关重要,以实现序列计划的预期加速。为此,分析性能指标对于调整并行实现是必不可少的,并确定通过缩放或增加流程数[1-4]来确定可以预期的改进。在不考虑与每个可用并行系统相关联的这种情况,负载平衡和通信开销的重要方面,学生们难以获得预期的性能并确定哪种方法适合于解决手头的问题。在没有明确的指导或实际经验的情况下,在不明确的指导或实践经验中找到并行平台之间的最佳映射很难实现。给定一些遗留码,第一方法包括查看并行编程模式,例如分割和征服或分解技术。这种模式广泛用于设计多种应用的有效算法[5]。学生已经熟悉递归地将问题划分为较小的子问题的方法。可以独立解决的子问题的识别是实现速度的直观选择。即使这种情况并非总是这种情况,这种方法也允许学生使用消息传递和共享地址空间编程范例快速实施令人尴尬的并行程序。最终,与串行和并行运行时实现收集并对比的性能度量。检查和讨论影响算法性能的开销源。这种实际方法可以在各种经典问题上重复(矩阵乘法,分类,数值积分和重力N主体)并通过应用各种并行编程模式扩展到新问题

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号