Learning Individual Models for Imputation

机译：学习个别模型归咎

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Missing numerical values are prevalent, e.g., owing to unreliable sensor reading, collection and transmission among heterogeneous sources. Unlike categorized data imputation over a limited domain, the numerical values suffer from two issues: (1) sparsity problem, the incomplete tuple may not have sufficient complete neighbors sharing the same/similar values for imputation, owing to the (almost) infinite domain; (2) heterogeneity problem, different tuples may not fit the same (regression) model. In this study, enlightened by the conditional dependencies that hold conditionally over certain tuples rather than the whole relation, we propose to learn a regression model individually for each complete tuple together with its neighbors. Our IIM, Imputation via Individual Models, thus no longer relies on sharing similar values among the k complete neighbors for imputation, but utilizes their regression results by the aforesaid learned individual (not necessary the same) models. Remarkably, we show that some existing methods are indeed special cases of our IIM, under the extreme settings of the number ? of learning neighbors considered in individual learning. In this sense, a proper number ? of neighbors is essential to learn the individual models (avoid over-fitting or under-fitting). We propose to adaptively learn individual models over various number ? of neighbors for different complete tuples. By devising efficient incremental computation, the time complexity of learning a model reduces from linear to constant. Experiments on real data demonstrate that our IIM with adaptive learning achieves higher imputation accuracy than the existing approaches.

机译：缺失的数值是普遍的，例如，由于不可靠的传感器读数，采集和传输异构源之间。不同于在有限域分类的数据插补，数值从两个问题遭受：（1）稀疏性问题，不完全的元组可能不具有足够完整的邻居共享用于插补相同/相似的值，由于（几乎）无限域; （2）不均匀性的问题，不同的元组可能不适合相同（回归）模型。在这项研究中，通过持有有条件地对某些元组，而不是整个关系的条件依赖关系的启发，我们建议单独学习回归模型与邻国的每个完整的元组在一起。我们的政府间会议，通过个别型号归责，从而不再依赖于用于归集k个完整的邻居之间共享相似的价值观，而是由上述了解到个人（不必是相同的）模型利用他们的回归结果。值得注意的是，我们表明，现有的一些方法确实是我们的IIM的特殊情况下，根据数量的极端的设置？学习在个人学习认为邻居。从这个意义上讲，一个适当的数字？邻居是必不可少的学习个体模型（避免过度拟合或欠拟合）。我们建议自适应学习个别车型通过各种号码是多少？的邻居完全不同的元组。通过设计高效的增量计算，学习模型的时间复杂度从线性到恒定减小。实际数据实验表明，我们的自适应学习IIM实现更高的精确度估算比现有的方法。

著录项

来源
《IEEE International Conference on Data Engineering》|2019年|721p|共12页
会议地点
作者
Aoqian Zhang; Shaoxu Song; Yu Sun; Jianmin Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据处理、数据处理系统;
关键词
Data models; Adaptation models; Computational modeling; Predictive models; Numerical models; Aggregates; Regression tree analysis;

机译：数据模型;适配模型;计算建模;预测模型;数值模型;聚集体;回归树分析;

相似文献

外文文献
中文文献
专利

1. Missing Categorical Data Imputation and Individual Observation Level Imputation [J] . Zimmermann Pavel, Mazouch Petr, Hulíková Tesárková Klára Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis . 2014,第6期

机译：分类数据归因缺失和个人观察水平归因
2. Missing Categorical Data Imputation and Individual Observation Level Imputation [J] . Zimmermann Pavel, Mazouch Petr, Hulíková Tesárková Klára Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis . 2014,第6期

机译：分类数据归因缺失和个人观察水平归因
3. Imputation of non-genotyped individuals based on genotyped relatives: assessing the imputation accuracy of a real case scenario in dairy cattle [J] . Aniek C Bouwman, John M Hickey, Mario PL Calus, Genetics, selection, evolution . 2014,第1期

机译：基于基因型亲属的非基因型个体估算：评估奶牛真实案例中的估算准确性
4. Learning Individual Models for Imputation [C] . Aoqian Zhang, Shaoxu Song, Yu Sun, IEEE International Conference on Data Engineering . 2019

机译：学习归因的个体模型
5. Multiple imputation of missing data in structural equation models with mediators and moderators using gradient boosted machine learning. [D] . Milletich, Robert J., II. 2016

机译：使用梯度增强的机器学习，在具有中介和主持人的结构方程模型中对缺失数据进行多次插补。
6. The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model [O] . Chao-Yu Guo, Ying-Chen Yang, Yi-Hau Chen 2021

机译：基于最佳机器学习的缺失数据归档用于COX比例危险模型
7. Imputation of non-genotyped individuals based on genotyped relatives: assessing the imputation accuracy of a real case scenario in dairy cattle [O] . Bouwman, Aniek C, Hickey, John M, Calus, Mario PL, 2014

机译：基于基因型亲属的非基因型个体的估算：评估奶牛真实案例中的估算准确性

Learning Individual Models for Imputation

摘要

著录项

相似文献

相关主题

期刊订阅