首页> 外文期刊>Expert Systems with Application >Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures
【24h】

Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures

机译:基于邻域信息熵的混合数据驱动离群值检测及其发展措施

获取原文
获取原文并翻译 | 示例
       

摘要

The outlier relies on its distinctive mechanism and valuable information to play an important role in expert and intelligent systems, and thus outlier detection has already been extensively applied in relevant fields including the fraud detection, medical diagnosis, public security, etc. The outlier detection methods of rough sets recently gain in-depth research, because they are data-driven and never require additional knowledge. However, classical rough set-based methods consider only categorical data; furthermore, neighborhood rough sets adhere to numeric and heterogeneous data, but their outlier detection is mainly restricted to numeric data now. According to the hybrid data-driving, this paper investigates outlier detection by the neighborhood information entropy and its developmental measures, and the applicable data sets widely concern categorical, numeric, and mixed data; as a result, the new method extends both the traditional distance-based and rough set-based methods to enrich outlier detection. Concretely, the neighborhood information system is first determined by the heterogeneous distance and self-adapting radius, the neighborhood information entropy is then defined to implement whole uncertainty measurement, three gradual information measures are further constructed to describe each single object, and finally the neighborhood entropy-based outlier factor (NEOF) is integratedly established to detect outliers; moreover, the NEOF-based outlier detection algorithm (called the NIEOD algorithm) is designed and applied. By virtue of UCI data experiments, the NIEOD algorithm is compared with six existing detection algorithms (including the NED, IE, SEQ, FindCBLOF, DIS, KNN algorithms), and the concrete results generally reflect the better effectiveness and adaptability of the new method.
机译:离群值依靠其独特的机制和有价值的信息在专家和智能系统中发挥重要作用,因此离群值检测已在欺诈检测,医疗诊断,公共安全等相关领域得到了广泛的应用。粗糙集的研究最近得到了深​​入的研究,因为它们是数据驱动的,并且不需要其他知识。但是,传统的基于粗糙集的方法仅考虑分类数据。此外,邻域粗糙集遵循数字数据和异构数据,但是它们的异常值检测现在主要限于数字数据。通过混合数据驱动,研究了邻域信息熵的异常值检测及其发展措施,适用的数据集广泛涉及分类,数值和混合数据。结果,新方法扩展了传统的基于距离的方法和基于粗糙集的方法,以丰富异常值检测。具体地,首先由异类距离和自适应半径确定邻域信息系统,然后定义邻域信息熵以实现整体不确定性测量,进一步构造三个渐进的信息量度来描述每个单个对象,最后邻域熵综合建立基于异常的离群因子(NEOF)以检测离群值;此外,设计并应用了基于NEOF的离群值检测算法(称为NIEOD算法)。通过UCI数据实验,将NIEOD算法与六种现有检测算法(包括NED,IE,SEQ,FindCBLOF,DIS,KNN算法)进行了比较,具体结果总体上反映了该新方法的更好的有效性和适应性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号