首页> 外文期刊>Journal of applied statistics >Cluster-based multivariate outlier identification and re-weighted regression in linear models
【24h】

Cluster-based multivariate outlier identification and re-weighted regression in linear models

机译:线性模型中基于聚类的多元离群值识别和重新加权回归

获取原文
获取原文并翻译 | 示例
           

摘要

A cluster methodology, motivated by a robust similarity matrix is proposed for identifying likely multivariate outlier structure and to estimate weighted least-square (WLS) regression parameters in linear models. The proposed method is an agglomeration of procedures that begins from clustering the n-observations through a test of 'no-outlier hypothesis' (TONH) to a weighted least-square regression estimation. The cluster phase partition the n-observations into h-set called main cluster and a minor cluster of size n-h. A robust distance emerge from the main cluster upon which a test of no outlier hypothesis' is conducted. An initial WLS regression estimation is computed from the robust distance obtained from the main cluster. Until convergence, a re-weighted least-squares (RLS) regression estimate is updated with weights based on the normalized residuals. The proposed procedure blends an agglomerative hierarchical cluster analysis of a complete linkage through the TONH to the Re-weighted regression estimation phase. Hence, we propose to call it cluster-based re-weighted regression (CBRR). The CBRR is compared with three existing procedures using two data sets known to exhibit masking and swamping. The performance of CBRR is further examined through simulation experiment. The results obtained from the data set illustration and the Monte Carlo study shows that the CBRR is effective in detecting multivariate outliers where other methods are susceptible to it. The CBRR does not require enormous computation and is substantially not susceptible to masking and swamping.
机译:提出了一种基于鲁棒相似矩阵的聚类方法,用于识别可能的多元离群值结构并估计线性模型中的加权最小二乘(WLS)回归参数。所提出的方法是一个过程的集合,该过程从对n个观测值进行聚类开始,然后通过对“非离群假设”(TONH)进行检验,再进行加权最小二乘回归估计。聚类阶段将n个观测划分为称为主聚类的h-集和大小为n-h的次聚类。从主聚类中出现稳健的距离,在该距离上进行无异常假设的检验。根据从主聚类获得的稳健距离计算初始WLS回归估计。在收敛之前,将使用基于归一化残差的权重更新重新加权的最小二乘(RLS)回归估计。所提出的过程将通过TONH到Re-weighted回归估计阶段的完整链接的聚集层次聚类分析进行了混合。因此,我们建议称其为基于聚类的重新加权回归(CBRR)。使用两个已知会表现出掩盖和沼泽的数据集,将CBRR与三个现有过程进行比较。通过仿真实验进一步验证了CBRR的性能。从数据集说明和蒙特卡洛研究获得的结果表明,CBRR在检测其他方法易受其影响的多元离群值方面有效。 CBRR不需要大量的计算,并且基本上不易被掩盖和淹没。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号