...
首页> 外文期刊>Experimental Mechanics >Reducing communication in parallel graph search algorithms with software caches
【24h】

Reducing communication in parallel graph search algorithms with software caches

机译:使用软件缓存减少并行图搜索算法中的通信

获取原文
获取原文并翻译 | 示例
           

摘要

In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a graph search, it is possible to capitalize on the characteristics of small-world networks and cache relevant information of high-degree vertexes. We applied this idea by caching remote vertex ids in a parallel breadth-first search benchmark. Our experiment with different implementations demonstrated significant performance improvements over the reference implementation in several configurations, using 64 to 1024 cores. We proposed a system design in which resources are dedicated exclusively to caching and shared among a set of nodes. Our evaluation demonstrates that this design reduces communication and has the potential to improve performance on large-scale systems in which the communication cost increases significantly with the distance between nodes. We also tested a memcached system as the cache server finding that its generic protocol, which does not match our usage semantics, hinders significantly the potential performance improvements and suggested that a generic system should also support a basic and lightweight communication protocol to meet the needs of high-performance computing applications. Finally, we explored different configurations to find efficient ways to utilize the resources allocated to solve a given problem size; to this extent, we found utilizing half of the compute cores per allocated node improves performance, and even in this case, caching variants always outperform the reference implementation.
机译:在许多科学和计算领域,图形都用于表示和分析数据。这样的图通常表现出小世界网络的特征:很少有高阶顶点连接许多低阶顶点。尽管在图搜索中是随机的,但仍可以利用小世界网络的特征并缓存高度顶点的相关信息。我们通过在并行的广度优先搜索基准中缓存远程顶点ID来应用此想法。我们使用不同实现的实验证明,在使用64至1024个内核的几种配置中,与参考实现相比,性能有了显着提高。我们提出了一种系统设计,其中资源专门用于缓存并在一组节点之间共享。我们的评估表明,这种设计减少了通信,并具有提高大型系统性能的潜力,在大型系统中,通信成本随节点之间的距离而显着增加。我们还测试了memcached系统作为缓存服务器,发现它的通用协议与我们的使用语义不匹配,这极大地阻碍了潜在的性能改进,并建议通用系统还应该支持基本的轻量级通信协议,以满足以下需求:高性能计算应用程序。最后,我们探索了不同的配置,以找到有效的方法来利用分配的资源来解决给定的问题大小。在此程度上,我们发现在每个分配的节点上使用一半的计算核心可以提高性能,即使在这种情况下,缓存变体也总是优于参考实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号