首页> 外文OA文献 >Computational intelligence techniques for missing data imputation
【2h】

Computational intelligence techniques for missing data imputation

机译:缺失数据插补的计算智能技术

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Despite considerable advances in missing data imputation techniques over the last three decades, theudproblem of missing data remains largely unsolved. Many techniques have emerged in the literatureudas candidate solutions, including the Expectation Maximisation (EM), and the combination of autoassociativeudneural networks and genetic algorithms (NN-GA). The merits of both these techniquesudhave been discussed at length in the literature, but have never been compared to each other. Thisudthesis contributes to knowledge by firstly, conducting a comparative study of these two techniques..udThe significance of the difference in performance of the methods is presented. Secondly, predictiveudanalysis methods suitable for the missing data problem are presented. The predictive analysis inudthis problem is aimed at determining if data in question are predictable and hence, to help inudchoosing the estimation techniques accordingly. Thirdly, a novel treatment of missing data for onlineudcondition monitoring problems is presented. An ensemble of three autoencoders together withudhybrid Genetic Algorithms (GA) and fast simulated annealing was used to approximate missinguddata. Several significant insights were deduced from the simulation results. It was deduced that forudthe problem of missing data using computational intelligence approaches, the choice of optimisationudmethods plays a significant role in prediction. Although, it was observed that hybrid GA and FastudSimulated Annealing (FSA) can converge to the same search space and to almost the same valuesudthey differ significantly in duration. This unique contribution has demonstrated that a particularudinterest has to be paid to the choice of optimisation techniques and their decision boundaries.udiiiudAnother unique contribution of this work was not only to demonstrate that a dynamic programmingudis applicable in the problem of missing data, but to also show that it is efficient in addressing theudproblem of missing data. An NN-GA model was built to impute missing data, using the principleudof dynamic programing. This approach makes it possible to modularise the problem of missinguddata, for maximum efficiency. With the advancements in parallel computing, various modules ofudthe problem could be solved by different processors, working together in parallel. Furthermore, audmethod for imputing missing data in non-stationary time series data that learns incrementally evenudwhen there is a concept drift is proposed. This method works by measuring the heteroskedasticityudto detect concept drift and explores an online learning technique. New direction for research, whereudmissing data can be estimated for nonstationary applications are opened by the introduction of thisudnovel method. Thus, this thesis has uniquely opened the doors of research to this area. Manyudother methods need to be developed so that they can be compared to the unique existing approachudproposed in this thesis.udAnother novel technique for dealing with missing data for on-line condition monitoring problem wasudalso presented and studied. The problem of classifying in the presence of missing data was addressed,udwhere no attempts are made to recover the missing values. The problem domain was then extendedudto regression. The proposed technique performs better than the NN-GA approach, both in accuracyudand time efficiency during testing. The advantage of the proposed technique is that it eliminatesudthe need for finding the best estimate of the data, and hence, saves time. Lastly, instead of usingudcomplicated techniques to estimate missing values, an imputation approach based on rough sets isudexplored. Empirical results obtained using both real and synthetic data are given and they provide audvaluable and promising insight to the problem of missing data. The work, has significantly confirmedudthat rough sets can be reliable for missing data estimation in larger and real databases.
机译:尽管在过去的三十年中,缺失数据的插补技术取得了长足的进步,但缺失数据的“难题”仍未解决。文献 udas候选解决方案中出现了许多技术,包括期望最大化(EM)以及自动关联神经网络和遗传算法(NN-GA)的组合。这两种技术的优点在文献中已进行了详尽的讨论,但从未相互比较过。首先,通过对这两种技术进行比较研究,这有助于知识的发展。 ud介绍了方法性能差异的重要性。其次,提出了适合丢失数据问题的预测分析方法。该问题中的预测分析旨在确定所讨论的数据是否可预测,从而帮助相应地选择估算技术。第三,提出了一种针对在线状态监测问题的数据丢失处理方法。将三个自动编码器与 udhybrid遗传算法(GA)和快速模拟退火的集成用于近似丢失 uddata。从仿真结果中得出了一些重要的见解。可以推断,对于使用计算智能方法的丢失数据的问题,优化的选择/方法在预测中起着重要的作用。尽管已观察到混合GA和Fast udSimated退火(FSA)可以收敛到相同的搜索空间,并且收敛到几乎相同的值,但是持续时间差异很大。这种独特的贡献表明必须对优化技术及其决策边界的选择给予特殊的关注。 udiii ud这项工作的另一独特贡献不仅在于证明动态编程可以用于解决问题。丢失数据,但也表明它可以有效地解决丢失数据的 ud问题。使用动态编程原理 udof,建立了一个NN-GA模型来估算缺失的数据。这种方法可以对丢失 uddata的问题进行模块化,以实现最高效率。随着并行计算的进步,可以通过并行工作的不同处理器来解决问题的各个模块。此外,提出了一种在非平稳时间序列数据中插补缺失数据的方法,该方法在存在概念漂移的情况下甚至可以递增学习。该方法通过测量异方差 ud来检测概念漂移,并探索一种在线学习技术。通过引入这种 udnovel方法,可以为非平稳应用估算丢失数据的研究新方向。因此,本论文独特地打开了该领域的研究之门。需要开发许多 udother方法,以便可以将它们与本文中现有的独特方法进行比较。 ud还提出并研究了另一种用于处理在线状态监视问题的丢失数据的新技术。解决了在缺少数据的情况下进行分类的问题,其中没有尝试恢复丢失的值。然后将问题域扩展 udto回归。所提出的技术在测试过程中的准确性,ud和时间效率方面均优于NN-GA方法。所提出的技术的优点是,它消除了寻找最佳数据估计的需求,从而节省了时间。最后,不是使用复杂的技术来估计缺失值,而是基于粗糙集的插补方法。给出了使用真实数据和综合数据获得的经验结果,它们为丢失数据的问题提供了一个可评估的和有希望的见解。这项工作已明显证实 uds粗集对于大型和真实数据库中的缺失数据估计是可靠的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号