首页> 外文会议>IEEE International Conference on Data Engineering >Learning Individual Models for Imputation
【24h】

Learning Individual Models for Imputation

机译:学习个别模型归咎

获取原文

摘要

Missing numerical values are prevalent, e.g., owing to unreliable sensor reading, collection and transmission among heterogeneous sources. Unlike categorized data imputation over a limited domain, the numerical values suffer from two issues: (1) sparsity problem, the incomplete tuple may not have sufficient complete neighbors sharing the same/similar values for imputation, owing to the (almost) infinite domain; (2) heterogeneity problem, different tuples may not fit the same (regression) model. In this study, enlightened by the conditional dependencies that hold conditionally over certain tuples rather than the whole relation, we propose to learn a regression model individually for each complete tuple together with its neighbors. Our IIM, Imputation via Individual Models, thus no longer relies on sharing similar values among the k complete neighbors for imputation, but utilizes their regression results by the aforesaid learned individual (not necessary the same) models. Remarkably, we show that some existing methods are indeed special cases of our IIM, under the extreme settings of the number ? of learning neighbors considered in individual learning. In this sense, a proper number ? of neighbors is essential to learn the individual models (avoid over-fitting or under-fitting). We propose to adaptively learn individual models over various number ? of neighbors for different complete tuples. By devising efficient incremental computation, the time complexity of learning a model reduces from linear to constant. Experiments on real data demonstrate that our IIM with adaptive learning achieves higher imputation accuracy than the existing approaches.
机译:缺失的数值是普遍的,例如,由于不可靠的传感器读数,采集和传输异构源之间。不同于在有限域分类的数据插补,数值从两个问题遭受:(1)稀疏性问题,不完全的元组可能不具有足够完整的邻居共享用于插补相同/相似的值,由于(几乎)无限域; (2)不均匀性的问题,不同的元组可能不适合相同(回归)模型。在这项研究中,通过持有有条件地对某些元组,而不是整个关系的条件依赖关系的启发,我们建议单独学习回归模型与邻国的每个完整的元组在一起。我们的政府间会议,通过个别型号归责,从而不再依赖于用于归集k个完整的邻居之间共享相似的价值观,而是由上述了解到个人(不必是相同的)模型利用他们的回归结果。值得注意的是,我们表明,现有的一些方法确实是我们的IIM的特殊情况下,根据数量的极端的设置?学习在个人学习认为邻居。从这个意义上讲,一个适当的数字?邻居是必不可少的学习个体模型(避免过度拟合或欠拟合)。我们建议自适应学习个别车型通过各种号码是多少?的邻居完全不同的元组。通过设计高效的增量计算,学习模型的时间复杂度从线性到恒定减小。实际数据实验表明,我们的自适应学习IIM实现更高的精确度估算比现有的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号