首页> 外文期刊>IEEE Transactions on Information Theory >A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data
【24h】

A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data

机译:用于高维数据异常检测的压缩PCA子空间方法

获取原文
获取原文并翻译 | 示例

摘要

Random projection is widely used as a method of dimension reduction. In recent years, its combination with standard techniques of regression and classification has been explored. Here, we examine its use for anomaly detection in high-dimensional settings, in conjunction with principal component analysis (PCA) and corresponding subspace detection methods. We assume a so-called spiked covariance model for the underlying data generation process and a Gaussian random projection. We adopt a hypothesis testing perspective of the anomaly detection problem, with the test statistic defined to be the magnitude of the residuals of a PCA analysis. Under the null hypothesis of no anomaly, we characterize the relative accuracy with which the mean and variance of the test statistic from compressed data approximate those of the corresponding test statistic from uncompressed data. Furthermore, under a suitable alternative hypothesis, we provide expressions that allow for a comparison of statistical power for detection. Finally, whereas these results correspond to the ideal setting in which the data covariance is known, we show that it is possible to obtain the same order of accuracy when the covariance of the compressed measurements is estimated using a sample covariance, as long as the number of measurements is of the same order of magnitude as the reduced dimensionality. We illustrate the practical impact of our results in the context of predicting volume anomalies in Internet traffic data.
机译:随机投影被广泛用作缩小尺寸的方法。近年来,已经探索了它与标准回归和分类技术的结合。在这里,我们结合主要成分分析(PCA)和相应的子空间检测方法,检查了其在高维设置中异常检测的用途。我们为基础数据生成过程和高斯随机投影假设一个所谓的尖峰协方差模型。我们采用异常检测问题的假设检验视角,将检验统计量定义为PCA分析的残差大小。在没有异常的零假设下,我们描述了相对精度,根据相对精度,来自压缩数据的测试统计量的均值和方差近似于来自未压缩数据的相应测试统计量的均值和方差。此外,在适当的替代假设下,我们提供了允许比较统计功效以进行检测的表达式。最后,尽管这些结果与已知数据协方差的理想设置相对应,但我们表明,当使用样本协方差估算压缩测量的协方差时,只要数量是多少,就有可能获得相同的精度等级。测量的数量级与减小的维度相同。我们在预测互联网流量数据中的体积异常的情况下说明了我们的结果的实际影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号