首页> 外文期刊>Intelligent data analysis >Neighborhood relevant outlier detection approach based on information entropy
【24h】

Neighborhood relevant outlier detection approach based on information entropy

机译:基于信息熵的邻域相关离群点检测方法

获取原文
获取原文并翻译 | 示例
       

摘要

Outlier detection is an interesting issue in data mining and machine learning. In this paper, to detect outliers, an information-entropy-based k-nearest neighborhood relevant outlier factor algorithm is proposed that is combined with Shannon information theory and the triangle pruning strategy. The algorithm accounts for the data points whose k-nearest neighbors are distributed on the edge of the range within the designated radius. In particular, the neighborhood influence on each point is considered to address the problem of information concealment and submergence. Information entropy is used to calculate the weights to distinguish the importance of each attribute. Then, based on the attribute weights, the improved pruning strategy reduces the computational complexity of the subsequent procedures by removing some inliers and obtaining the outlier candidate dataset. Finally, according to the weighted distance between the objects in the candidate dataset and those in the original dataset, the algorithm calculates the dissimilarity between each object and its k-nearest neighbors. The data points with the top r dissimilarity are regarded as the outliers. Experimental results show that, compared
机译:离群检测是数据挖掘和机器学习中一个有趣的问题。为了检测离群值,提出了一种基于信息熵的k最近邻相关离群因子算法,该算法结合了Shannon信息理论和三角修剪策略。该算法考虑了其k最近邻分布在指定半径范围内的边缘上的数据点。特别是,在每个点上的邻域影响被认为解决了信息隐藏和淹没的问题。信息熵用于计算权重,以区分每个属性的重要性。然后,基于属性权重,改进的修剪策略通过删除一些内在值并获取异常值候选数据集来降低后续过程的计算复杂性。最后,根据候选数据集中的对象与原始数据集中的对象之间的加权距离,该算法计算每个对象与其k近邻之间的差异。 r相似度最高的数据点被视为离群值。实验结果表明,与

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号