A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

Bingming Wang; Shi Ying; Zhe Yang

首页> 外文期刊>Scientific programming >A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

【24h】

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

机译：一种基于逻辑的异常检测方法，具有有效邻居搜索和自动k邻居选择

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Using the k-nearest neighbor (kNN) algorithm in the supervised learning method to detect anomalies can get more accurate results. However, when using kNN algorithm to detect anomaly, it is inefficient at finding k neighbors from large-scale log data; at the same time, log data are imbalanced in quantity, so it is a challenge to select proper k neighbors for different data distributions. In this paper, we propose a log-based anomaly detection method with efficient selection of neighbors and automatic selection of k neighbors. First, we propose a neighbor search method based on minhash and MVP-tree. The minhash algorithm is used to group similar logs into the same bucket, and MVP-tree model is built for samples in each bucket. In this way, we can reduce the effort of distance calculation and the number of neighbor samples that need to be compared, so as to improve the efficiency of finding neighbors. In the process of selecting k neighbors, we propose an automatic method based on the Silhouette Coefficient, which can select proper k neighbors to improve the accuracy of anomaly detection. Our method is verified on six different types of log data to prove its universality and feasibility.

机译：在监督学习方法中使用K-Collect邻（KNN）算法来检测异常可以获得更准确的结果。然而，在使用KNN算法检测异常时，它在从大规模日志数据中查找k邻居时效率低下;同时，数量的日志数据的数量不平衡，因此为不同的数据分布选择正确的k邻居是一个挑战。在本文中，我们提出了一种基于逻辑的异常检测方法，具有有效选择邻居和自动选择k邻居。首先，我们提出了一种基于Minhash和MVP树的邻居搜索方法。 Minhash算法用于将类似的日志分组到相同的桶中，并且MVP树模型为每个桶中的样本构建。通过这种方式，我们可以减少距离计算的努力和需要比较的邻居样本的数量，从而提高找到邻居的效率。在选择K邻居的过程中，我们提出了一种基于轮廓系数的自动方法，其可以选择适当的K邻居以提高异常检测的准确性。我们的方法在六种不同类型的日志数据上验证，以证明其普遍性和可行性。

著录项

来源
《Scientific programming》 |2020年第3期|共17页
作者
Bingming Wang; Shi Ying; Zhe Yang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Log-Based Anomaly Detection with the Improved K-Nearest Neighbor [J] . Bingming Wang, Shi Ying, Guoli Cheng, International journal of software engineering and knowledge engineering . 2020,第2期

机译：基于日志的异常检测与改进的K-Collect邻居
2. An Efficient Method for k Nearest Neighbor Searching in Obstructed Spatial Databases [J] . Yu Gu, Ge Yu, Xiaonan Yu Journal of information science and engineering . 2014,第5期

机译：阻塞空间数据库中k最近邻搜索的一种有效方法
3. CANF: Clustering and anomaly detection method using nearest and farthest neighbor [J] . Azadeh Faroughi, Reza Javidan Future generation computer systems . 2018,第DECa期

机译：CANF：使用最近和最远邻居的聚类和异常检测方法
4. A neighbor selection method based on network community detection for collaborative filtering [C] . Guo Lin, Peng Qinke IEEE/ACIS International Conference on Computer and Information Science . 2014

机译：基于网络社区检测的邻居过滤协同过滤方法
5. Techniques for efficient k-nearest neighbor searching in non-ordered discrete and hybrid data spaces. [D] . Kolbe, Dashiell Matthews. 2010

机译：在无序离散和混合数据空间中有效进行k最近邻搜索的技术。
6. An Efficient Automatic Gait Anomaly Detection Method Based on Semisupervised Clustering [O] . Zhenlun Yang 2021

机译：一种基于半经验群体的高效自动步态异常检测方法
7. An Efficient Clustering Method for Hyperspectral Optimal Band Selection via Shared Nearest Neighbor [O] . Qiang Li, Qi Wang, Xuelong Li 2019

机译：共享最近邻的高光谱最优频带选择的有效聚类方法
8. Computerized Pattern Recognition Applications to Chemical Analysis. Development of Interactive Feature Selection Methods for the K-Nearest Neighbor Technique. [R] . pichler, marty a. perone,sam p. 1974

机译：计算机模式识别在化学分析中的应用。 K-最近邻技术交互特征选择方法的发展。

A Log-Based Anomaly Detection Method with Efficient Neighbor Searching and Automatic K Neighbor Selection

摘要

著录项

相似文献

相关主题

期刊订阅