【24h】

Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

机译:使用基于群集的本地离群因素从端到端Internet性能度量(PingER)检测异常

获取原文
获取原文并翻译 | 示例

摘要

PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, network software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Finally, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.
机译:PingER(Ping端到端报告)是一个全球性的端到端Internet性能评估框架。它是由美国斯坦福的SLAC国家加速器实验室开发的,从过去的20年开始运行。它拥有700多个监视代理程序和远程站点,它们监视着世界170个国家/地区的Internet链接的性能。目前,压缩的PingER数据集的大小约为60 GB,包含100,000个平面文件。该数据可公开获得,以进行有价值的Internet性能分析。但是,由于拥塞,瓶颈链路,排队溢出,网络软件配置错误,硬件故障,电缆切断和社会动荡,数据集遭受缺失值和异常的困扰。因此,本文的目的是检测PingER数据集的此类性能下降或峰值,标记为异常或离群值。在提出的方法中,将数据集的原始文本文件转换为PingER维度模型。使用k-NN算法估算缺失值。使用k-均值聚类算法将数据划分为相似的实例。然后,使用基于聚类的局部离群因子(CBLOF)算法将聚类与局部离群因子(LOF)集成在一起,以从PingER数据中检测异常或离群值。最后,对异常进行进一步分析,以识别在1998年至2016年PingER数据集中产生异常主要百分比的主机的时间范围和位置。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号