首页> 外文期刊>American Journal of Mathematics and Statistics >Comparison of Outlier Detection Procedures in Multiple Linear Regressions
【24h】

Comparison of Outlier Detection Procedures in Multiple Linear Regressions

机译:多元线性回归中离群值检测程序的比较

获取原文
获取外文期刊封面目录资料

摘要

Regression analysis has become one of most widely used statistical tools for analyzing multifactor data. It is appealing because it provides a conceptually simple method for investigating functional relationship among variables. A relationship is expressed in the form of an equation or a model connecting the response or dependent variable and one or more explanatory or predictor variables. The major problem that statisticians have been confronted with, while dealing with regression analysis, is presence of outliers in data. An outlier is an observation that lies outside the overall pattern of a distribution. In other words it is a point which falls more than 1.5 times the interquartile range above the third quartile or below the first quartile. Several statistics are available to detect whether or not outlier(s) are present in data. Therefore, in this study, a simulation study was conducted to investigate the performance of Deffits, Cooks distance and Mahalanobis distance at different proportion of outliers (10%, 20% and 30% )and for various sample sizes (10, 30 and 100) in first, second or both independent variables. The data were generated using R software from normal distribution while the outliers were from uniform distribution. Findings: For small and medium sample sizes and at 10% level of outliers, Mahalanobis distance should be employed for her accuracy of detection of outliers. For small, medium and large sample size with higher percentage of outliers, Deffits should be employed. For small, medium and large sample sizes, Deffits should be used in detecting outlier signal irrespective of the percentage levels of outliers in the data set. For small sample and low percent of outliers Mahalanobis distance should be employed for easy computation.
机译:回归分析已成为分析多因素数据的最广泛使用的统计工具之一。之所以具有吸引力,是因为它提供了一种概念上简单的方法来研究变量之间的功能关系。关系以方程或模型的形式表示,该方程或模型将响应或因变量与一个或多个解释变量或预测变量相连。在进行回归分析时,统计人员面临的主要问题是数据中存在异常值。离群值是位于分布总体模式之外的观察值。换句话说,该点下降到四分位数间距的1.5倍以上,高于第三四分位数或低于第一四分位数。有几种统计数据可用于检测数据中是否存在异常值。因此,在这项研究中,我们进行了模拟研究,以研究在不同比例的异常值(10%,20%和30%)和各种样本量(10、30和100)下的Deffits,Cooks距离和Mahalanobis距离的性能。在第一,第二或两个自变量中。数据是使用R软件从正态分布生成的,而异常值是从均匀分布生成的。 发现:对于中小样本量和离群值在10%的水平,应采用马氏距离来检测离群值。对于具有较高异常值百分比的小样本,中样本和大样本,应使用拟合。对于小,中和大样本量,无论数据集中异常值的百分比水平如何,都应使用Deffits检测异常信号。对于小样本和低百分比的离群值,应采用马氏距离,以便于计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号