Anomaly detection in large-scale coalition clusters for dependability assurance

机译：大规模联盟集群中的异常检测可靠性保证

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In large-scale high-performance computing systems, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operation costs are becoming an increasingly important concern to system designers and administrators. When a compute node fails to function properly, health-related data are valuable for troubleshooting. However, it is challenging to effectively identify anomalies from the voluminous amount of noisy, high-dimensional data. Manual detection is time-consuming and error-prone. It does not scale well. In this paper, we present an autonomic mechanism for anomaly detection in coalition clusters. It is composed of a set of techniques that facilitates automatic analysis of system health data. We apply data transformation to format health data in a uniform manner. Then principal variables are chosen by feature selection, which reduces the data size. Clustering and outlier detection are explored to identify nodes with anomalous behavior. We evaluate our prototype implementation on a production institution-wide computational grid. The results show that our mechanism can effectively detect faulty nodes with high accuracy and low computation overhead.

机译：在大型高性能计算系统中，组件故障成为规范而不是例外。失败发生以及其对系统性能和运营成本的影响正在成为系统设计师和管理员越来越重要的关注。当计算节点无法正常运行时，与健康相关的数据对于故障排除是有价值的。然而，有效地识别来自大量嘈杂，高维数据的异常挑战。手动检测是耗时和容易出错的。它没有很好地扩展。在本文中，我们提出了联盟集群中异常检测的自主机制。它由一系列技术组成，便于自动分析系统健康数据。我们应用数据转换以统一的方式格式化运行状况数据。然后通过特征选择选择主变量，从而降低了数据大小。探讨聚类和异常检测以识别具有异常行为的节点。我们在生产机构宽的计算网格上评估我们的原型实施。结果表明，我们的机制可以有效地检测具有高精度和低计算开销的故障节点。

著录项

来源
《International Conference on High Performance Computing》|2010年||共10页
会议地点
作者
{missing};
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词
Anomaly detection; Autonomic systems; Coalition clusters; Compute grids; System dependability;

机译：异常检测;自主系统;联盟簇;计算网格;系统可靠性;

相似文献

外文文献
中文文献
专利

1. Scalable Anomaly Detection for Large-Scale Heterogeneous Data in Cloud Using Optimal Elliptic Curve Cryptography and Gaussian Kernel Fuzzy C-Means Clustering [J] . Kumar P. Santhosh, Parthiban Latha Journal of circuits, systems and computers . 2020,第5期

机译：使用最佳椭圆曲线加密和高斯内核模糊C-MERIAL聚类的云中大规模异构数据的可扩展异常检测
2. ANOMALY DETECTION IN LARGE-SCALE TRAJECTORIES USING HYBRID GRID-BASED HIERARCHICAL CLUSTERING [J] . Ding Feng, Wang Jian, Ge Jiaqi, International Journal of Robotics & Automation . 2018,第5期

机译：基于混合网格的分层聚类的大规模轨迹中的异常检测
3. ADSTREAM: Anomaly Detection in Large-Scale Data Streams Using Local Outlier Factor Based on Micro-Cluster [J] . Advanced Science Letters . 2017,第10期

机译：adstream：使用基于微簇的本地异常因素的大规模数据流中的异常检测
4. Anomaly detection in large-scale coalition clusters for dependability assurance [C] . 17th International Conference on High Performance Computing . 2010

机译：大规模联盟集群中的异常检测以确保可靠性
5. Dependable computing on inexact hardware through anomaly detection [D] . Khudia, Daya Shanker 2015

机译：通过异常检测对不精确的硬件可靠计算
6. Reliable detection of fluence anomalies in EPID-based IMRT pretreatment quality assurance using pixel intensity deviations [O] . J. J. Gordon, J. K. Gardner, S. Wang, -1

机译：使用像素强度偏差在基于EPID的IMRT预处理质量保证中可靠地检测注量异常
7. Research on Algorithm of Dependability Oriented Anomaly Detection of Virtual Machines under Cloud [O] . Hongli Li 2016

机译：云下虚拟机可靠性异常检测算法研究
8. Clustering and Recurring Anomaly Identification: Recurring Anomaly Detection System (ReADS) [R] . McIntosh, Dawn 2006

机译：聚类和重复异常识别：重复异常检测系统（ReaDs）

Anomaly detection in large-scale coalition clusters for dependability assurance

摘要

著录项

相似文献

相关主题

期刊订阅