首页> 外文期刊>Engineering Applications of Artificial Intelligence >A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems
【24h】

A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems

机译:基于半监督的自动编码器的高性能计算系统异常检测方法

获取原文
获取原文并翻译 | 示例
       

摘要

High Performance Computing (HPC) systems are complex machines with heterogeneous components that can break or malfunction. Automated anomaly detection in these systems is a challenging and critical task, as HPC systems are expected to work 24/7. The majority of the current state-of-the-art methods dealing with this problem are Machine Learning techniques or statistical models that rely on a supervised approach, namely the detection mechanism is trained to recognize a fixed number of different states (i.e. normal and anomalous conditions).In this paper a novel semi-supervised approach for anomaly detection in supercomputers is proposed, based on a type of neural network called autoencoder. The approach learns the normal state of the supercomputer nodes and after the training phase can be used to discern anomalous conditions from normal behavior; in doing so it relies only on the availability of data characterizing only the normal state of the system. This is different from supervised methods that require data sets with many examples of anomalous states, which are in general very rare and/or hard to obtain.The approach was tested on a real-life High Performance Computing system equipped with a monitoring infrastructure capable to generate large amount of data describing the system state. The proposed approach definitely outperforms the best current techniques for semi-supervised anomaly detection, with an increase in accuracy detection of around 12%. Two different implementations are discussed: one where each supercomputer node has a specific model and one with a single, generalized model for all nodes, in order to explore the trade-off between accuracy and ease of deployment.
机译:高性能计算(HPC)系统是具有异构组件的复杂机器,这些组件可能会损坏或发生故障。这些系统中的自动异常检测是一项艰巨而关键的任务,因为HPC系统有望以24/7的速度运行。当前处理此问题的大多数最新方法是机器学习技术或依靠监督方法的统计模型,即检测机制经过训练以识别固定数量的不同状态(即正常和异常)本文基于一种称为自动编码器的神经网络,提出了一种新的超级计算机半监督异常检测方法。该方法学习超级计算机节点的正常状态,并且在训练阶段之后可以用来从正常行为中识别异常情况。在这种情况下,它仅取决于仅表征系统正常状态的数据的可用性。这与需要数据集的异常状态的监督方法不同,监督方法通常是非常罕见的和/或难以获得的。该方法已在配有监控基础设施的实时高性能计算系统上进行了测试,生成大量描述系统状态的数据。所提出的方法绝对优于目前最好的半监督异常检测技术,其准确度检测提高了约12%。讨论了两种不同的实现方式:一种是每个超级计算机节点都有一个特定的模型,另一种是所有节点都具有一个通用的模型,目的是在准确性和易于部署之间进行权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号