...
首页> 外文期刊>Statistics and computing >Outlyingness: Which variables contribute most?
【24h】

Outlyingness: Which variables contribute most?

机译:外围因素:哪些变量贡献最大?

获取原文
获取原文并翻译 | 示例

摘要

Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier's outlyingness. Thereby, it helps the analyst understand in which way an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The performed method is demonstrated to perform well both on simulated data and real life examples.
机译:离群检测是大多数统计数据分析的必然步骤。但是,仅检测到异常情况并不能总是回答与该数据点相关的所有科学问题。传统的和鲁棒的异常检测技术通常会将整个案例标记为离群,或者将特定案例权重分配给整个案例。实际上,特别是在高维数据中,离群值很可能不在其所有变量的旁边,而仅在变量的一部分上。如果是这样,那么为什么将案件标记为异常值的科学问题就引起了人们的兴趣。在本文中,提出了一种快速有效的方法来检测对离群值异常影响最大的变量。因此,它有助于分析人员了解异常值的分布方式。在这项工作中所追求的方法是估计最大离群值的单变量方向。结果表明,估计该方向的问题可以重写为经典最小二乘回归问题的范数解。因此,可以通过以稀疏方式估计相关的最小二乘问题来识别对外围因素贡献最大的变量子集。从实际的角度来看,建议使用稀疏偏最小二乘(SPLS)回归(最好通过快速稀疏NIPALS(SNIPLS)算法)来解决该问题。演示了所执行的方法在模拟数据和实际示例中均表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号