A New Algorithm for Detecting Outliers in Linear Regression

Mehmet Satman

首页> 外文期刊>International Journal of Statistics and Probability >A New Algorithm for Detecting Outliers in Linear Regression

【24h】

A New Algorithm for Detecting Outliers in Linear Regression

机译：一种检测线性回归中异常值的新算法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a new algorithm for detecting multiple outliers in linear regression. The algorithm is based on a non-iterative robust covariance matrix and concentration steps used in LTS estimation. A robust covariance matrix is constructed to calculate Mahalanobis distances of independent variables which are then used as weights in weighted least squares estimation. A few concentration steps are then performed using the observations that have smallest residuals. We generate random data sets for $n=10^3, 10^4, 10^5$ and $p=5,10$ to show up the capabilities of the algorithm. In our Monte Carlo simulations, it is shown that our algorithm has very low masking and swamping ratios when the number of observations is up to $10^4$ in the case of maximum contamination in X-Space. It is also shown that, the algorithm is successful in the case of Y-Space outliers when the contamination level, sample size and number of parameters are up to $30%$, $n=10^5$, and $p=10$, respectively. Bias, variance and MSE statistics are calculated for different scenarios. The reported computation time of our implementation is quite short. It is concluded that the presented algorithm is suitable and applicable for detecting multiple outliers in regression analysis with its small masking and swamping ratios, accurate estimates of regression parameters except the intercept, and short computation time in large data sets and high level of contamination. A future work is required for reducing bias and variance of the intercept estimator in the model.

机译：在本文中，我们提出了一种用于检测线性回归中多个异常值的新算法。该算法基于LT估计中使用的非迭代强大的协方差矩阵和集中步骤。构建强大的协方差矩阵以计算独立变量的Mahalanobis距离，然后在加权最小二乘估计中用作重量。然后使用具有最小残留物的观察结果进行几个浓度步骤。我们为$ n = 10 ^ 3,10 ^ 4,10 ^ 5 $和$ p = 5,10 $生成随机数据集，以显示算法的功能。在我们的蒙特卡罗模拟中，显示在X空间最大污染的情况下观察的次数高达10 ^ 4美元时，我们的算法具有非常低的掩蔽和雨水比率。还示出了，算法在Y空间异常值的情况下成功，当污染水平时，样本大小和参数数量高达30 ％$，$ n = 10 ^ 5 $，$ p = 10 $分别。针对不同场景计算偏差，方差和MSE统计数据。我们实施的报告计算时间很短。得出结论是，所提出的算法适用于，适用于检测回归分析中的多个异常值，其小屏蔽和沼泽比，除截距之外的回归参数的准确估计，以及大数据集中的短计算时间和高污染。未来的工作是在模型中减少拦截估计器的偏差和方差。

著录项

来源
《International Journal of Statistics and Probability》 |2013年第3期|共9页
作者
Mehmet Satman;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类概率论与数理统计;
关键词

相似文献

外文文献
中文文献
专利

1. GIS-based landslide susceptibility mapping using numerical risk factor bivariate model and its ensemble with linear multivariate regression and boosted regression tree algorithms [J] . Alireza ARABAMERI, Biswajeet PRADHAN, Khalil REZAEI, 山地科学学报（英文版） . 2019,第003期
2. Effects of Multicollinearity on Type I Error of Some Methods of Detecting Heteroscedasticity in Linear Regression Model [J] . Olusegun Olatayo Alabi, Kayode Ayinde, Omowumi Esther Babalola, 统计学期刊（英文） . 2020,第004期
3. Bayesian Segmentation of Piecewise Linear Regression Models Using Reversible Jump MCMC Algorithm [J] . Suparman, Michel Doisy 计算机技术与应用：英文 . 2015,第001期
4. Linear-regression models and algorithms based on the Total-Least-Squares principle [J] . Ding Shijun, Jiang Weiping, Shen Zhijuan 大地测量与地球动力学（英文版） . 2012,第002期
5. A Dynamic Programming Track-Before-Detect Algorithm Based on Local Linearization for Non-Gaussian Clutter Background [J] . ZHENG Daikun, WANG Shouyong, QIN Xing 电子学报（英文版） . 2016,第003期
6. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating [J] . Yong He, Zhongmin Cui, Yu Fang, Applied Psychological Measurement . 2013,第7期

机译：使用线性回归方法检测IRT公共项目等同中的离群值
7. Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization [J] . Sung-Soo Kim, W. J. Krzanowski Computational Statistics . 2007,第1期

机译：使用聚类方法和图形可视化相结合的线性回归检测多个异常值
8. USING ROBUST SCALE ESTIMATES IN DETECTING MULTIPLE OUTLIERS IN LINEAR REGRESSION [J] . Swallow WH., Kianifard F. Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 1996,第2期

机译：在线性回归中使用鲁棒标度估计来检测多个外围
9. The Effect of Different Distance Measures in Detecting Outliers using Clustering-based Algorithm for Circular Regression Model [C] . Nur Faraidah Muhammad Di, Siti Zanariah Satari ISM International Statistical Conference . 2017

机译：不同距离措施在使用基于聚类的循环回归模型检测异常值的影响
10. A COMPARISON OF UNBIASED, BIASED, AND WEIGHTED MULTIPLE LINEAR REGRESSION APPROACHES TO SUPPORT EDUCATIONAL POLICY IN THE IDENTIFICATION OF OUTLIER SCHOOLS (RIDGE REGRESSION, PREDICTION, RESIDUAL, EXPLANATION, NEEDS, ASSESSMENT) [D] . BIGELOW, ROBERT ASHLEY. 1984

机译：比较，偏重和加权的多元线性回归方法来支持对局外学校的教育政策的识别（岭回归，预测，残差，解释，需求，评估）
11. Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate [O] . Harvey J Motulsky, Ronald E Brown 2006

机译：用非线性回归拟合数据时检测异常值–基于鲁棒非线性回归和错误发现率的新方法
12. A New Algorithm for Detecting Outliers in Linear Regression [O] . Mehmet Hakan Satman 2013

机译：检测线性回归中离群值的新算法
13. Multiple Regression Technique for Detecting Outliers [R] . Leroy, A., Rousseeuw, P. 1984

机译：检测异常值的多元回归技术

A New Algorithm for Detecting Outliers in Linear Regression

摘要

著录项

相似文献

相关主题

期刊订阅