Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures

Zhong Yuan; Xianyong Zhang; Shan Feng

首页> 外文期刊>Expert Systems with Application >Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures

【24h】

Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures

机译：基于邻域信息熵的混合数据驱动离群值检测及其发展措施

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The outlier relies on its distinctive mechanism and valuable information to play an important role in expert and intelligent systems, and thus outlier detection has already been extensively applied in relevant fields including the fraud detection, medical diagnosis, public security, etc. The outlier detection methods of rough sets recently gain in-depth research, because they are data-driven and never require additional knowledge. However, classical rough set-based methods consider only categorical data; furthermore, neighborhood rough sets adhere to numeric and heterogeneous data, but their outlier detection is mainly restricted to numeric data now. According to the hybrid data-driving, this paper investigates outlier detection by the neighborhood information entropy and its developmental measures, and the applicable data sets widely concern categorical, numeric, and mixed data; as a result, the new method extends both the traditional distance-based and rough set-based methods to enrich outlier detection. Concretely, the neighborhood information system is first determined by the heterogeneous distance and self-adapting radius, the neighborhood information entropy is then defined to implement whole uncertainty measurement, three gradual information measures are further constructed to describe each single object, and finally the neighborhood entropy-based outlier factor (NEOF) is integratedly established to detect outliers; moreover, the NEOF-based outlier detection algorithm (called the NIEOD algorithm) is designed and applied. By virtue of UCI data experiments, the NIEOD algorithm is compared with six existing detection algorithms (including the NED, IE, SEQ, FindCBLOF, DIS, KNN algorithms), and the concrete results generally reflect the better effectiveness and adaptability of the new method.

机译：离群值依靠其独特的机制和有价值的信息在专家和智能系统中发挥重要作用，因此离群值检测已在欺诈检测，医疗诊断，公共安全等相关领域得到了广泛的应用。粗糙集的研究最近得到了深入的研究，因为它们是数据驱动的，并且不需要其他知识。但是，传统的基于粗糙集的方法仅考虑分类数据。此外，邻域粗糙集遵循数字数据和异构数据，但是它们的异常值检测现在主要限于数字数据。通过混合数据驱动，研究了邻域信息熵的异常值检测及其发展措施，适用的数据集广泛涉及分类，数值和混合数据。结果，新方法扩展了传统的基于距离的方法和基于粗糙集的方法，以丰富异常值检测。具体地，首先由异类距离和自适应半径确定邻域信息系统，然后定义邻域信息熵以实现整体不确定性测量，进一步构造三个渐进的信息量度来描述每个单个对象，最后邻域熵综合建立基于异常的离群因子（NEOF）以检测离群值；此外，设计并应用了基于NEOF的离群值检测算法（称为NIEOD算法）。通过UCI数据实验，将NIEOD算法与六种现有检测算法（包括NED，IE，SEQ，FindCBLOF，DIS，KNN算法）进行了比较，具体结果总体上反映了该新方法的更好的有效性和适应性。

著录项

来源
《Expert Systems with Application》 |2018年第12期|243-257|共15页
作者
Zhong Yuan; Xianyong Zhang; Shan Feng;
展开▼
作者单位

College of Mathematics and Software Science, Sichuan Normal University,Institute of Intelligent Information and Quantum Information, Sichuan Normal University;

College of Mathematics and Software Science, Sichuan Normal University,Institute of Intelligent Information and Quantum Information, Sichuan Normal University;

College of Mathematics and Software Science, Sichuan Normal University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Outlier detection; Neighborhood rough set; Neighborhood information entropy; Hybrid data-driving; Data mining;

机译：离群值检测邻域粗糙集邻域信息熵混合数据驱动数据挖掘;

相似文献

外文文献
中文文献
专利

1. Neighborhood relevant outlier detection approach based on information entropy [J] . Yu Qingying, Luo Yonglong, Chen Chuanming, Intelligent data analysis . 2016,第6期

机译：基于信息熵的邻域相关离群点检测方法
2. Fuzzy information entropy-based adaptive approach for hybrid feature outlier detection [J] . Yuan Zhong, Chen Hongmei, Li Tianrui, Fuzzy sets and systems . 2021,第Sepa30期

机译：基于模糊信息熵的混合特征异常检测的自适应方法
3. Outlier Detection Using the Information Entropy of Neighborhood Rough Sets [J] . Xiangjun Li, Shengfeng Tian, Taorong Qiu, Journal of information and computational science . 2012,第12期

机译：使用邻域粗糙集信息熵的异常值检测
4. A hybrid outlier detection algorithm based on partitioning clustering and density measures [C] . Hamada Rizk, Sherin Elgokhy, Amany Sarhan International Conference on Computer Engineering Systems . 2015

机译：基于分区聚类和密度测度的混合离群值检测算法
5. A detection and mitigation system for unintended acceleration: An integrated hybrid data-driven and model-based approach. [D] . Yu, Hongtao. 2016

机译：用于意外加速的检测和缓解系统：集成的混合数据驱动和基于模型的方法。
6. A Data-Driven Measure of Effective Connectivity Based on Renyis α-Entropy [O] . Ivan De La Pava Panche, Andres M. Alvarez-Meza, Alvaro Orozco-Gutierrez 2019

机译：基于仁义的α熵的有效连通性的数据驱动量度
7. Efficient Neighborhood Density Based Outlier Detection Inside a Sub Network with High Dimensional Data [O] . Chippada Nagamani, Suneetha Chittineni 2019

机译：基于高维度数据的子网内基于高效的邻域密度的异常检测

Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅