首页> 外文期刊>Scientific programming >A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection
【24h】

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

机译:一种基于逻辑的异常检测方法,具有有效邻居搜索和自动k邻居选择

获取原文
       

摘要

Using the k-nearest neighbor (kNN) algorithm in the supervised learning method to detect anomalies can get more accurate results. However, when using kNN algorithm to detect anomaly, it is inefficient at finding k neighbors from large-scale log data; at the same time, log data are imbalanced in quantity, so it is a challenge to select proper k neighbors for different data distributions. In this paper, we propose a log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors. First, we propose a neighbor search method based on minhash and MVP-tree. The minhash algorithm is used to group similar logs into the same bucket, and MVP-tree model is built for samples in each bucket. In this way, we can reduce the effort of distance calculation and the number of neighbor samples that need to be compared, so as to improve the efficiency of finding neighbors. In the process of selecting k neighbors, we propose an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection. Our method is verified on six different types of log data to prove its universality and feasibility.
机译:在监督学习方法中使用K-Collect邻(KNN)算法来检测异常可以获得更准确的结果。然而,在使用KNN算法检测异常时,它在从大规模日志数据中查找k邻居时效率低下;同时,数量的日志数据的数量不平衡,因此为不同的数据分布选择正确的k邻居是一个挑战。在本文中,我们提出了一种基于逻辑的异常检测方法,具有有效选择邻居和自动选择k邻居。首先,我们提出了一种基于Minhash和MVP树的邻居搜索方法。 Minhash算法用于将类似的日志分组到相同的桶中,并且MVP树模型为每个桶中的样本构建。通过这种方式,我们可以减少距离计算的努力和需要比较的邻居样本的数量,从而提高找到邻居的效率。在选择K邻居的过程中,我们提出了一种基于轮廓系数的自动方法,其可以选择适当的K邻居以提高异常检测的准确性。我们的方法在六种不同类型的日志数据上验证,以证明其普遍性和可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号