Cluster-based multivariate outlier identification and re-weighted regression in linear models

Alih Ekele; Ong Hong Choon

首页> 外文期刊>Journal of applied statistics >Cluster-based multivariate outlier identification and re-weighted regression in linear models

【24h】

Cluster-based multivariate outlier identification and re-weighted regression in linear models

机译：线性模型中基于聚类的多元离群值识别和重新加权回归

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A cluster methodology, motivated by a robust similarity matrix is proposed for identifying likely multivariate outlier structure and to estimate weighted least-square (WLS) regression parameters in linear models. The proposed method is an agglomeration of procedures that begins from clustering the n-observations through a test of 'no-outlier hypothesis' (TONH) to a weighted least-square regression estimation. The cluster phase partition the n-observations into h-set called main cluster and a minor cluster of size n-h. A robust distance emerge from the main cluster upon which a test of no outlier hypothesis' is conducted. An initial WLS regression estimation is computed from the robust distance obtained from the main cluster. Until convergence, a re-weighted least-squares (RLS) regression estimate is updated with weights based on the normalized residuals. The proposed procedure blends an agglomerative hierarchical cluster analysis of a complete linkage through the TONH to the Re-weighted regression estimation phase. Hence, we propose to call it cluster-based re-weighted regression (CBRR). The CBRR is compared with three existing procedures using two data sets known to exhibit masking and swamping. The performance of CBRR is further examined through simulation experiment. The results obtained from the data set illustration and the Monte Carlo study shows that the CBRR is effective in detecting multivariate outliers where other methods are susceptible to it. The CBRR does not require enormous computation and is substantially not susceptible to masking and swamping.

机译：提出了一种基于鲁棒相似矩阵的聚类方法，用于识别可能的多元离群值结构并估计线性模型中的加权最小二乘（WLS）回归参数。所提出的方法是一个过程的集合，该过程从对n个观测值进行聚类开始，然后通过对“非离群假设”（TONH）进行检验，再进行加权最小二乘回归估计。聚类阶段将n个观测划分为称为主聚类的h-集和大小为n-h的次聚类。从主聚类中出现稳健的距离，在该距离上进行无异常假设的检验。根据从主聚类获得的稳健距离计算初始WLS回归估计。在收敛之前，将使用基于归一化残差的权重更新重新加权的最小二乘（RLS）回归估计。所提出的过程将通过TONH到Re-weighted回归估计阶段的完整链接的聚集层次聚类分析进行了混合。因此，我们建议称其为基于聚类的重新加权回归（CBRR）。使用两个已知会表现出掩盖和沼泽的数据集，将CBRR与三个现有过程进行比较。通过仿真实验进一步验证了CBRR的性能。从数据集说明和蒙特卡洛研究获得的结果表明，CBRR在检测其他方法易受其影响的多元离群值方面有效。 CBRR不需要大量的计算，并且基本上不易被掩盖和淹没。

著录项

来源
《Journal of applied statistics》 |2015年第6期|938-955|共18页
作者
Alih Ekele; Ong Hong Choon;
展开▼
作者单位

Univ Sains Malaysia, Sch Math Sci, George Town 11800, Malaysia;

Univ Sains Malaysia, Sch Math Sci, George Town 11800, Malaysia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
addition point OLS; outliers; masking and swamping; test of no outlier hypothesis; half set; pivot point matrix; clean subset;

机译：附加点OLS;离群值;掩盖和淹没;无离群值假设的检验;半集;枢轴点矩阵;干净子集;

相似文献

外文文献
中文文献
专利

1. Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model [J] . Sung-Soo Kim, Sung H. Park, W.J. Krzanowski Journal of applied statistics . 2008,第3a4期

机译：使用均值漂移离群模型的线性回归中的同时变量选择和离群值识别
2. Outliers in Multivariate Regression Models [J] . Muni S. Srivastava, Dietrich von Rosen Journal of Multivariate Analysis: An International Journal . 1998,第2期

机译：多元回归模型中的离群值
3. Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models [J] . Kristin T?ndel, Ulf G Indahl, Arne B Gjuvsland, BMC Systems Biology . 2011,第1期

机译：基于层次聚类的偏最小二乘回归（HC-PLSR）是非线性动力学模型的元建模的有效工具
4. A Comparative Study of Linear and Nonlinear Regression Models for Outlier Detection [C] . Paul Inuwa Dalatu, Anwar Fitrianto, Aida Mustapha International Conference on Soft Computing and Data Mining . 2017

机译：对比检测线性和非线性回归模型的比较研究
5. A COMPARISON OF UNBIASED, BIASED, AND WEIGHTED MULTIPLE LINEAR REGRESSION APPROACHES TO SUPPORT EDUCATIONAL POLICY IN THE IDENTIFICATION OF OUTLIER SCHOOLS (RIDGE REGRESSION, PREDICTION, RESIDUAL, EXPLANATION, NEEDS, ASSESSMENT) [D] . BIGELOW, ROBERT ASHLEY. 1984

机译：比较，偏重和加权的多元线性回归方法来支持对局外学校的教育政策的识别（岭回归，预测，残差，解释，需求，评估）
6. Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models [O] . Kristin Tøndel, Ulf G Indahl, Arne B Gjuvsland, 2011

机译：基于层次聚类的偏最小二乘回归（HC-PLSR）是用于非线性动力学模型的元建模的有效工具
7. wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression [O] . Tobias Schoch 2021

机译：WBACON：用于多变量异常值提名（检测）和强大的线性回归的加权培根算法
8. Detection of Outliers in Multivariate Linear Regression Model [R] . Naik, D. N. 1986

机译：多元线性回归模型中异常值的检测

Cluster-based multivariate outlier identification and re-weighted regression in linear models

摘要

著录项

相似文献

相关主题

期刊订阅