首页> 外文会议>International Conference on Data Mining >Detecting Outliers in High-Dimensional Datasets with Mixed Attributes
【24h】

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

机译:检测具有混合属性的高维数据集中的异常值

获取原文

摘要

Outlier Detection has attracted substantial attention in many applications and research areas. Examples include detection of network intrusions or credit card fraud. Many of the existing approaches are based on pair-wise distances among all points in the dataset. These approaches cannot easily extend to current datasets that usually contain a mix of categorical and continuous attributes, and may be scattered over large geographical areas. In addition, current datasets usually have a large number of dimensions. These datasets tend to be sparse, and traditional concepts such as Euclidean distance or nearest neighbor become unsuitable. We propose ODMAD, a fast outlier detection strategy intended for datasets containing mixed attributes. ODMAD takes into consideration the sparseness of the dataset, and is experimentally shown to be highly scalable with the number of points and number of attributes in the dataset.
机译:异常值检测在许多应用和研究领域中引起了大量的关注。例子包括检测网络入侵或信用卡欺诈。许多现有方法基于数​​据集中所有点之间的成对距离。这些方法不能轻易扩展到通常包含分类和连续属性混合的当前数据集,并且可以在大型地理区域上分散。此外,当前数据集通常具有大量维度。这些数据集往往是稀疏的,传统的概念,如欧几里德距离或最近邻居变得不合适。我们提出ODMAD,一种快速的异常值检测策略,用于包含混合属性的数据集。 ODMAD考虑了数据集的稀疏性,并且通过数据集中的点数和属性数量进行实验显示,可以高度可扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号