首页> 外文期刊>Quality and Reliability Engineering International >A Little-known Robust Estimator of the Correlation Coefficient and Its Use in a Robust Graphical Test for Bivariate Normality with Applications in the Aluminium Industry
【24h】

A Little-known Robust Estimator of the Correlation Coefficient and Its Use in a Robust Graphical Test for Bivariate Normality with Applications in the Aluminium Industry

机译:相关系数的鲜为人知的鲁棒估计器及其在双变量正态性的鲁棒图形测试中的应用及其在铝工业中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Industrial and business data often contain outliers. The reasons why outliers occur can be unclear procedures for production tasks or measurement, operators who do not follow procedures, failures in production equipment or measurement equipment, the wrong type of raw material, failure in raw material, registration errors or the fact that the response is influenced by many other factors as well as the available explanatory variables. Often there is no identifiable cause for the outliers and they are considered to be an intrinsic part of the dataset. Since data are often considered pairwise, and more methods for analysing pairwise data are available if the data-generating process can be modelled by a bivariate normal distribution, there is a need for a straightforward test of bivariate normality that is robust against outliers. This paper looks at a graphical test, based on probability plotting, for assessing whether it is reasonable to assume that a bivariate dataset stems from an approximately bivariate normal distribution, where the possibility for outliers is taken into account. The robust graphical (Robug) test uses a little-known estimator of the correlation coefficient, which is demonstrated to be robust against outliers. The graphical test is illustrated using data from our practical work. First the little-known robust estimator of the correlation parameter in the bivariate normal distribution is compared with the traditional estimator, the product moment correlation coefficient, often called Pearson's r, and Spearman's rank correlation coefficient and Kendall's tau. The little-known estimator is a transformation of Kendall's tau. The comparison is partly based on theory, and partly on the simulation of observations from the bivariate normal distribution. Our conclusions are that when outliers are not an issue, Pearson's r, Spearman's coefficient and the transformation of Kendall's tau do not perform very differently in terms of bias, standard deviation and root mean square error, while Kendall's tau is too biased to be used for the purpose in question. Concerning robustness to outliers, Pearson's r is inferior to the other estimators. It seems likely that the transformation of Kendall's tau, which is far less well-known than Pearson's r and Spearman's rank correlation coefficient, is at least as good as Spearman's coefficient when the possibility of outliers must be taken into consideration. Business and industrial improvement often requires the use of information that can be extracted from multivariate data. When the multivariate normal (MVN) distribution can be used to model the data-generating process, more methods are generally available for analysing the data and providing predictions. Many datasets are naturally approximately MVN so that deviations from normality imply special causes. Thus, tests for MVN facilitate the detection of outliers. Considerable insight is gained by looking at the data singly or pairwise. Pairwise datasets that come from a process that can be modelled as MVN, can be modelled by a bivariate normal distribution. The robust graphical test in this paper is therefore also useful for assessing whether a multivariate dataset comes from an approximate MVN distribution.
机译:工业和商业数据通常包含异常值。发生异常的原因可能是生产任务或测量程序不明确,操作人员未遵循该程序,生产设备或测量设备发生故障,原材料类型错误,原材料发生故障,配准错误或响应的事实受许多其他因素以及可用的解释变量的影响。通常,没有异常原因的可识别原因,它们被认为是数据集的固有部分。由于数据通常被视为成对的,并且如果可以通过双变量正态分布对数据生成过程进行建模,则可以使用更多的分析成对数据的方法,因此需要一种对异常值具有鲁棒性的双变量正态性的直接测试。本文着眼于基于概率图的图形化测试,以评估假设双变量数据集源自近似双变量正态分布是否合理,并考虑了离群值的可能性。鲁棒的图形(Robug)测试使用了鲜为人知的相关系数估计器,证明了它对异常值具有鲁棒性。使用我们实际工作中的数据说明了图形化测试。首先,将二元正态分布中相关参数的鲜为人知的鲁棒估计器与传统估计器,乘积矩相关系数(通常称为Pearson's r,Spearman秩相关系数和Kendall's tau)进行比较。鲜为人知的估算器是肯德尔tau的变换。比较部分基于理论,部分基于对二元正态分布的观察结果的模拟。我们的结论是,当离群值不是问题时,Pearson的r,Spearman系数和Kendall的tau变换在偏差,标准偏差和均方根误差方面的表现并没有很大不同,而Kendall的tau过于偏向以至于无法用于目的。关于离群值的鲁棒性,Pearson的r不如其他估计量。当必须考虑离群值的可能性时,肯德尔tau的变换似乎远不如Spearman系数好,而Kendall的tau变换远不及Pearson的r和Spearman的秩相关系数。商业和工业改善通常需要使用可从多元数据中提取的信息。当可以使用多元正态(MVN)分布对数据生成过程进行建模时,通常可以使用更多方法来分析数据和提供预测。许多数据集自然都近似为MVN,因此偏离正态性暗示特殊原因。因此,对MVN的测试有助于检测异常值。通过单独或成对查看数据可以获得相当多的见识。来自可以建模为MVN的过程的成对数据集可以通过二元正态分布建模。因此,本文中强大的图形测试对于评估多元数据集是否来自近似MVN分布也很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号