【24h】

Detecting anomalies in high-performance parallel programs

机译:在高性能并行程序中检测异常

获取原文

摘要

Message passing interface (MPI) is an effective programming technique for implementing parallel programs for distributed computation. As these applications run, a number of different types of irregularities can occur including those that result from intrusions, user misbehavior, corrupted data, deadlocks or failure of cluster components. We perform a comparison of different artificial intelligence (AI) techniques that can be used to implement a lightweight monitoring and detection system for parallel applications on a cluster of Linux workstations. We study the accuracy and performance of deterministic and stochastic algorithms when we observe the flow of function library and OS system calls of parallel programs written with MPI. We demonstrate that monitoring of MPI programs can be achieved with high accuracy and in some cases with a 0% false positive rate in real-time, and we show that the added computational load on each node is small. Finally we demonstrate that simple deterministic methods perform poorly when the program flow grows in size and variety, and that more complex methods are required.
机译:消息传递接口(MPI)是一种有效的编程技术,用于实现并行程序以进行分布式计算。随着这些应用程序的运行,可能会出现许多不同类型的异常情况,包括由于入侵,用户行为不当,数据损坏,死锁或群集组件故障而导致的异常情况。我们对不同的人工智能(AI)技术进行了比较,这些技术可用于为Linux工作站集群上的并行应用程序实现轻量级的监视和检测系统。当我们观察用MPI编写的并行程序的功能库和OS系统调用的流程时,我们将研究确定性和随机算法的准确性和性能。我们证明了可以高精度地监视MPI程序,并且在某些情况下可以实时将误报率设为0%,并且证明了每个节点上增加的计算量很小。最后,我们证明了当程序流的大小和种类增加时,简单的确定性方法的性能较差,并且需要更复杂的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号