首页> 外文期刊>SIGKDD explorations >Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection
【24h】

Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection

机译:基于随机距离的异常检测的超高维数据学习表示

获取原文
获取原文并翻译 | 示例
           

摘要

Learning expressive low-dimensional representations of ultrahigh dimensional data, e.g., data with thousands/millions of features, has been a major way to enable learning methods to address the curse of dimensionality. However, existing unsupervised representation learning methods mainly focus on preserving the data regularity information and learning the representations independently of subsequent outlier detection methods, which can result in suboptimal and unstable performance of detecting irregularities (i.e., outliers). This paper introduces a ranking model-based framework, called RAMODO, to address this issue. RAMODO unifies representation learning and outlier detection to learn low-dimensional representations that are tailored for a state-of-the-art outlier detection approach - the random distance-based approach. This customized learning yields more optimal and stable representations for the targeted outlier detectors. Additionally, RAMODO can leverage little labeled data as prior knowledge to learn more expressive and application-relevant representations. We instantiate RAMODO to an efficient method called REPEN to demonstrate the performance of RAMODO. Extensive empirical results on eight real-world ultrahigh dimensional data sets show that REPEN (i) enables a random distance-based detector to obtain significantly better AUC performance and two orders of magnitude speedup; (ii) performs substantially better and more stably than four state-of-the-art representation learning methods; and (iii) leverages less than 1% labeled data to achieve up to 32% AUC improvement.
机译:学习超高尺寸数据的表现力的低维表示,例如,具有数千/数百万个特征的数据,是实现学习方法来解决维度的诅咒的主要方法。然而,现有的无监督的表示学习方法主要集中在保留数据规律性信息和学习表示的陈述,这些概率检测方法可以导致检测不规则性(即异常值)的次优和不稳定性能。本文介绍了一个基于排名的基于模型的框架,称为Ramodo,以解决这个问题。 Ramodo统一表示学习和异常值检测,以了解用于最先进的异常检测方法 - 基于随机距离的方法量身定制的低维表示。这种定制的学习对目标异常值探测器产生了更优化和稳定的表示。此外,Ramodo可以利用几乎标记的数据作为先验知识,以了解更多的表现力和相关的表现。我们将Ramodo实例化到一个称为Repen的有效方法,以演示Ramodo的性能。八个现实世界超高尺寸数据集的广泛经验结果表明,收益(i)能够获得随机距离的检测器,以获得明显更好的AUC性能和两个数量级加速; (ii)在四个最先进的代表学习方法中表现出基本更好,更稳定; (iii)利用少于1%的标记数据来实现高达32%的AUC改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号