On the Scalability of Parallel Quicksort: A Case Study on Distributed vs. Shared-Memory Models

机译：关于并行QuickSort的可扩展性：分布式与共享内存模型的案例研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The increasing availability of parallel systems affords undergraduates opportunities to experience firsthand the potential benefit and pitfalls of parallel programming. However, a clear understanding of the underlying architecture is critical to achieve the anticipated speed up of serial programs. To this end, analyzing performance metrics is essential for tuning parallel implementations and determining what improvement can be expected by scaling out or increasing the number of processes [1-4]. Without considering important aspects such locality, load balancing and communication overheads associated with each available parallel system, students are having a hard time obtaining the intended performance and determining which approach is suitable to solving the problem at hand. Finding the best mapping between the parallel implementation and the parallel platform available is hard to achieve without clear guidance or practical experiences. Given some legacy code, a first approach consists of looking at parallel programming patterns such as divide-and-conquer or decomposition techniques. This pattern has been used widely to design a number of efficient algorithms for a variety of applications [5]. Students are already familiar with the approach of recursively partitioning a problem into smaller sub-problems. Identification of sub-problems which can be solved independently is an intuitive choice for achieving speed up. Even if this is not always the case, this approach allows students to quickly implement embarrassingly parallel programs using message passing and shared-address space programming paradigms. Ultimately, performance metrics are collected and contrasted with the serial and parallel runtime implementations. Sources of overhead affecting the performances of the algorithms are examined and discussed. This practical approach can be repeated on various classical problems (matrix multiplication, sorting, numerical integration and the gravitational N-body) and extended to new problems by applying various parallel programming patterns

机译：并行系统的越来越多的可用性提供了大学生的机会，以便第一手体验并行编程的潜在利益和陷阱。但是，清楚地了解潜在的架构至关重要，以实现序列计划的预期加速。为此，分析性能指标对于调整并行实现是必不可少的，并确定通过缩放或增加流程数[1-4]来确定可以预期的改进。在不考虑与每个可用并行系统相关联的这种情况，负载平衡和通信开销的重要方面，学生们难以获得预期的性能并确定哪种方法适合于解决手头的问题。在没有明确的指导或实际经验的情况下，在不明确的指导或实践经验中找到并行平台之间的最佳映射很难实现。给定一些遗留码，第一方法包括查看并行编程模式，例如分割和征服或分解技术。这种模式广泛用于设计多种应用的有效算法[5]。学生已经熟悉递归地将问题划分为较小的子问题的方法。可以独立解决的子问题的识别是实现速度的直观选择。即使这种情况并非总是这种情况，这种方法也允许学生使用消息传递和共享地址空间编程范例快速实施令人尴尬的并行程序。最终，与串行和并行运行时实现收集并对比的性能度量。检查和讨论影响算法性能的开销源。这种实际方法可以在各种经典问题上重复（矩阵乘法，分类，数值积分和重力N主体）并通过应用各种并行编程模式扩展到新问题

著录项

来源
《International Conference on Parallel and Distributed Processing Techniques and Applications》|2015年||共3页
会议地点
作者
David Paulius; Marc Boumedine;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP316.4-53;
关键词
入库时间 2022-08-21 05:16:33

相似文献

外文文献
中文文献
专利

1. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments [J] . Shuangshuang Jin, Yousu Chen, Di Wu, IFAC PapersOnLine . 2015,第30期

机译：在共享内存与分布式内存环境中并行动态仿真的实现
2. A Comparative Study and Evaluation of Parallel Programming Models for Shared-Memory Parallel Architectures [J] . Luis Miguel SANCHEZ, Javier FERNANDEZ, Rafael SOTOMAYOR, New Generation Computing . 2013,第3期

机译：共享内存并行体系结构并行编程模型的比较研究和评估
3. A Comparative Study and Evaluation of Parallel Programming Models for Shared-Memory Parallel Architectures [J] . Luis Miguel Sanchez, Javier Fernandez, Rafael Sotomayor, New Generation Computing . 2013,第3期

机译：共享内存并行体系结构并行编程模型的比较研究和评估
4. On the Scalability of Parallel Quicksort: A Case Study on Distributed vs. Shared-Memory Models [C] . David Paulius, Marc Boumedine International conference on parallel and distributed processing techniques and applications . 2015

机译：并行快速排序的可伸缩性：分布式与共享内存模型的案例研究
5. Scalability Analysis of Parallel and Distributed Processing Systems via Fork and Join Queueing Network Models [D] . Zeng, Yun 2018

机译：通过Fork和Join Queuing网络模型分析并行和分布式处理系统的可伸缩性
6. Parallel vs. Distributed Data Access for Giga-pixel-resolution Histology Images: Challenges and Opportunities [O] . Esma Yildirim, David J. Foran -1

机译：千兆像素分辨率组织学图像的并行与分布式数据访问：挑战与机遇
7. Attacking Fair-Exchange Protocols Parallel Models vs. Trace Models11We thank the anonymous reviewers for many useful suggestions and C. Marchetti for discussing the tricky issue of (fault-tolerant) local and distributed implementations of the Fair-Exchange protocol. This work is partly supported by ASI, CNR, and MURST grants. F. Massacci acknowledges the support of the CNR Fellowship 203-7-27. [O] . Aiello Luigia Carlucci, Massacci Fabio 2003

机译：攻击公平交换协议的并行模型与跟踪模型11我们感谢匿名审阅者提出的许多有用建议，并感谢C. Marchetti讨论Fair-Exchange协议（容错）本地和分布式实现的棘手问题。 ASI，CNR和MURST赠款部分支持这项工作。马萨奇（F. Massacci）感谢CNR奖学金203-7-27的支持。
8. Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation. Appendix F. Studies in Parallel Image Processing [R] . Lin, G. M., Swain, P. H. 1984

机译：信号处理的分布式计算：异步并行计算的建模。附录F.并行图像处理研究

On the Scalability of Parallel Quicksort: A Case Study on Distributed vs. Shared-Memory Models

摘要

著录项

相似文献

相关主题

期刊订阅