Large scale anomaly detection in mixed numerical and categorical input spaces

Eiras-Franco Carlos; Martinez-Rego David; Guijarro-Berdinas Bertha; Alonso-Betanzos Amparo; Bahamonde Antonio

首页> 外文期刊>Information Sciences: An International Journal >Large scale anomaly detection in mixed numerical and categorical input spaces

【24h】

Large scale anomaly detection in mixed numerical and categorical input spaces

机译：混合数值和分类输入空间中的大规模异常检测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work presents the ADMNC method, designed to tackle anomaly detection for large-scale problems with a mixture of categorical and numerical input variables. A flexible parametric probability measure is adjusted to input data, allowing low likelihood values to be tracked as anomalies. The main contribution of this method is that, to cope with the variable nature of the variables, we factorize the joint probability measure into two parts, namely, the marginal density of the continuous variables and the conditional probability of the categorical variables given the continuous part of the feature vector. The result is a model trained through a maximum likelihood objective function optimized with stochastic gradient descent that yields an effective and scalable algorithm. Compared with other well-known anomaly detection algorithms over several datasets, ADMNC is observed to both offer top level accuracy in datasets that are out of reach for the most effective existing methods and to scale up well to processing very large datasets. This makes it a powerful tool for solving a problem growing in popularity that currently lacks suitable scalable algorithms. (C) 2019 Elsevier Inc. All rights reserved.

机译：这项工作介绍了ADMNC方法，旨在用分类和数值输入变量的混合来解决大规模问题的异常检测。将柔性参数概率测量调整为输入数据，允许将低似然值被跟踪为异常。这种方法的主要贡献是，为了应对变量的可变性质，我们将联合概率测量分解成两部分，即连续变量的边缘密度以及给出连续部分的分类变量的条件概率特征向量。结果是通过使用随机梯度下降优化的最大似然物镜函数训练的模型，其产生有效且可扩展的算法。与其他几个数据集的其他众所周知的异常检测算法相比，ADMNC将观察到在数据集中提供顶级准确性，以获得最有效的现有方法，并扩大到处理非常大的数据集。这使它成为解决当前缺乏合适的可扩展算法的普及问题的强大工具。（c）2019 Elsevier Inc.保留所有权利。

著录项

来源
《Information Sciences: An International Journal》 |2019年第2019期|共13页
作者
Eiras-Franco Carlos; Martinez-Rego David; Guijarro-Berdinas Bertha; Alonso-Betanzos Amparo; Bahamonde Antonio;
展开▼
作者单位

Univ A Coruna Res Ctr Informat &

Commun Technol CITIC La Coruna 15071 Spain;

Univ A Coruna Res Ctr Informat &

Commun Technol CITIC La Coruna 15071 Spain;

Univ A Coruna Res Ctr Informat &

Commun Technol CITIC La Coruna 15071 Spain;

Univ A Coruna Res Ctr Informat &

Commun Technol CITIC La Coruna 15071 Spain;

Univ Oviedo Ctr Inteligencia Artificial Gijon 33204 Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;计算机的应用;信息与知识传播;自动化技术、计算机技术;
关键词
Anomaly detection; Outlier detection; Scalability; Big data; Mixed data; Synthetic dataset generator;

机译：异常检测;异常检测;可伸缩性;大数据;混合数据;合成数据集生成器;

相似文献

外文文献
中文文献
专利

1. Large scale anomaly detection in mixed numerical and categorical input spaces [J] . Eiras-Franco Carlos, Martinez-Rego David, Guijarro-Berdinas Bertha, Information Sciences: An International Journal . 2019,第期

机译：混合数值和分类输入空间中的大规模异常检测
2. A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets [J] . Amir Ahmad, Lipika Dey Pattern recognition letters . 2011,第7期

机译：一种k均值类型聚类算法，用于混合数值和分类数据集的子空间聚类
3. Three-dimensional numerical modeling of gravity anomalies based on Poisson equation in space-wavenumber mixed domain [J] . Dai Shi-Kun, Zhao Dong-Dong, Zhang Qian-Jiang, 应用地球物理（英文版） . 2018,第003期

机译：波数混合域中基于泊松方程的重力异常三维数值模拟
4. MOIRE: Mixed-Order Poisson Regression towards Fine-grained Urban Anomaly Detection at Nationwide Scale [C] . Masamichi Shimosaka, Kota Tsubouchi, Yanru Chen, IEEE International Conference on Big Data . 2020

机译：MOIRE：在全国范围内朝着细粒度的城市异常探测的混合级泊松回归
5. Representing context-dependent categorical and mixed-value data systems for fault and anomaly detection: A highly scalable variable length Markov approach. [D] . Brice, Pierre. 2009

机译：表示用于故障和异常检测的上下文相关分类和混合值数据系统：一种高度可扩展的可变长度Markov方法。
6. Practical Dyspnea Assessment: Relationship Between the 0–10 Numerical Rating Scale and the Four-Level Categorical Verbal Descriptor Scale of Dyspnea Intensity [O] . Nicholas G. Wysham, Benjamin J. Miriovsky, David C. Currow, -1

机译：实际呼吸困难评估：呼吸困难强度的0–10数值评分量表与四级分类言语描述量表之间的关系
7. Anomaly subspace detection based on a multi-scale Markov random field model [O] . Arnon Goldman Israel 2004

机译：基于多尺度马尔可夫随机场模型的异常子空间检测

Large scale anomaly detection in mixed numerical and categorical input spaces

摘要

著录项

相似文献

相关主题

期刊订阅