首页> 美国卫生研究院文献>other >Universal Linear Fit Identification: A Method Independent of Data Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation
【2h】

Universal Linear Fit Identification: A Method Independent of Data Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation

机译:通用线性拟合识别:一种独立于数据离群值和噪声分布模型且无缺失或缺失数据插补的方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2 to identify linear fit, where n is number of terms in a series. The ratio R max of a max − a min and S n − a min *n and that of R min of a max − a min and a max *n − S n are always equal to 2, where a max is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax> 2 and Rmin> 2 imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2 * (1 + k1) and 2 * (1 + k2), respectively, where k1 > k2 and 0 ≤ k1≤ n/2 − 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10−4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.
机译:数据处理需要可靠的线性拟合识别方法。在本文中,我们介绍了一种用于时间序列的非参数鲁棒线性拟合识别方法。该方法使用指标2 / n标识线性拟合,其中n是序列中的项数。 max-a min和S n-a min * n的比率R max与 max -a min 的R min比率和 a max * n-S n 始终等于 2 / n ,其中 a max 是最大元素 a min 是最小元素, S n 是所有元素的总和。如果期望遵循 y = c 的任何序列包含的数据与 y = c 形式不一致,则 R max R min 分别表示最大和最小元素与线性拟合不一致。我们将异常值和噪声检测的阈值定义为 2 / n *(1 + k 1 2 / n *(1 + k 2 ,其中 k 1 k 2 0≤k 1 ≤n / 2-1 。使用这种关系和转换技术,可以将数据转换为 y = c 的形式,我们表明可以删除所有与线性拟合不一致的数据。此外,该方法与数据点的数量,丢失的数据,已删除的数据点以及离群点,噪声和纯净数据的分布性质(高斯或非高斯)无关。与现有的线性拟合方法相比,这些是主要优点。由于不可能在现实世界中的两个变量之间具有完美的线性关系,因此我们使用具有极端条件的人工数据集来验证该方法。当与线性拟合一致的数据的百分比小于50%,并且与线性拟合不一致的数据的偏差非常小时,该方法将检测到正确的线性拟合,约为±10 -4 < / sup>%。该方法仅在计算过程中数值精度不足时才导致错误的检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号