TRIP: An Interactive Retrieving-Inferring Data Imputation Approach

Li Zhixu; Qin Lu; Cheng Hong; Zhang Xiangliang; Zhou Xiaofang

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >TRIP: An Interactive Retrieving-Inferring Data Imputation Approach

【24h】

TRIP: An Interactive Retrieving-Inferring Data Imputation Approach

机译：TRIP：交互式检索-推断数据插补方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data imputation aims at filling in missing attribute values in databases. Most existing imputation methods to string attribute values are inferring-based approaches, which usually fail to reach a high imputation recall by just inferring missing values from the complete part of the data set. Recently, some retrieving-based methods are proposed to retrieve missing values from external resources such as the World Wide Web, which tend to reach a much higher imputation recall, but inevitably bring a large overhead by issuing a large number of search queries. In this paper, we investigate the interaction between the inferring-based methods and the retrieving-based methods. We show that retrieving a small number of selected missing values can greatly improve the imputation recall of the inferring-based methods. With this intuition, we propose an inTeractive Retrieving-Inferring data imPutation approach (TRIP), which performs retrieving and inferring alternately in filling in missing attribute values in a data set. To ensure the high recall at the minimum cost, TRIP faces a challenge of selecting the least number of missing values for retrieving to maximize the number of inferable values. Our proposed solution is able to identify an optimal retrieving-inferring scheduling scheme in deterministic data imputation, and the optimality of the generated scheme is theoretically analyzed with proofs. We also analyze with an example that the optimal scheme is not feasible to be achieved in -constrained stochastic data imputation (-SDI), but still, our proposed solution identifies an expected-optimal scheme in -SDI. Extensive experiments on four data collections show that TRIP retrieves on average 20 percent missing values and achieves the same high recall that was reached by the retrieving-based approach.

机译：数据插补旨在填补数据库中缺少的属性值。现有的大多数用于字符串属性值的插补方法都是基于推断的方法，通常仅从数据集的整个部分中推断出缺失的值，通常无法达到较高的插补回想率。近来，提出了一些基于检索的方法来从诸如万维网之类的外部资源中检索缺失值，这往往会达到更高的归因召回率，但是不可避免地会通过发出大量搜索查询而带来大量开销。在本文中，我们研究了基于推理的方法与基于检索的方法之间的相互作用。我们表明，检索少量选定的缺失值可以极大地改善基于推断的方法的估算召回率。凭此直觉，我们提出了一种交互式检索-推断数据输入方法（TRIP），该方法交替进行检索和推断，以填补数据集中缺少的属性值。为了以最小的成本确保较高的召回率，TRIP面临的挑战是选择最少数量的缺失值以进行检索，以使可推断值的数量最大化。我们提出的解决方案能够确定确定性数据插补中的最优检索-推理调度方案，并从理论上分析了生成方案的最优性。我们还通过示例分析，在约束随机数据插补（-SDI）中无法实现最优方案，但是，我们提出的解决方案仍在-SDI中确定了期望最优方案。对四个数据集进行的广泛实验表明，TRIP平均检索到20％的缺失值，并实现了与基于检索的方法相同的高召回率。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2015年第9期|2550-2563|共14页
作者
Li Zhixu; Qin Lu; Cheng Hong; Zhang Xiangliang; Zhou Xiaofang;
展开▼
作者单位

School of Computer Science and Technology, Soochow University, Suzhou, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Data Imputation; Data Repairing; Data imputation; Interactive Retrieving-Inferring; data repairing; interactive retrieving-inferring;

机译：数据归因;数据修复;数据归因;交互式检索-推断;数据修复;交互式检索-推断;

相似文献

外文文献
中文文献
专利

1. Activity imputation for trip-chains elicited from smart-card data using a continuous hidden Markov model [J] . Han Gain, Sohn Keemin Transportation research . 2016,第JANa期

机译：使用连续隐马尔可夫模型从智能卡数据中得出的旅行链的活动归因
2. A process for trip purpose imputation from Global Positioning System data [J] . Li Shen, Peter R. Stopher Transportation research . 2013,第nova期

机译：来自全球定位系统数据的行程目的估算过程
3. Analyzing and modeling inter-sensor relationships for strain monitoring data and missing data imputation: a copula and functional data-analytic approach [J] . Chen Zhicheng, Li Hui, Bao Yuequan Structural health monitoring . 2019,第4期

机译：分析和建模应变监测数据和缺失数据归因的传感器间关系：copula和功能数据分析方法
4. TRIP: An interactive retrieving-inferring data imputation approach [C] . Zhixu Li, Lu Qin, Hong Cheng, IEEE International Conference on Data Engineering . 2016

机译：TRIP：交互式检索-推断数据插补方法
5. Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level: A Monte Carlo Simulation to Assess the Tenability of the SuperMatrix Approach [D] . Lang, Kyle M. 2013

机译：通过在数据级别聚合多个插补来简化丢失的数据分析：评估超矩阵方法的持久性的蒙特卡洛模拟
6. A NONPARAMETRIC MULTIPLE IMPUTATION APPROACH FOR DATA WITH MISSING COVARIATE VALUES WITH APPLICATION TO COLORECTAL ADENOMA DATA [O] . Chiu-Hsieh Hsu, Qi Long, Yisheng Li, -1

机译：缺失协变量值的数据的非参数多重插补方法及其在大肠结节数据中的应用
7. Imputation of missing data in life‐history traits datasets: which approach performs the best? [O] . C Penone, AD Davidson, KT Shoemaker, 2014

机译：在生活历史特征数据集中估算缺失数据：哪种方法效果最好？

TRIP: An Interactive Retrieving-Inferring Data Imputation Approach

摘要

著录项

相似文献

相关主题

期刊订阅