SKIF: A Data Imputation Framework for Concept Drifting Data Streams

机译：SKIF：用于概念漂移数据流的数据归纳框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Missing data commonly occur in many applications. While many data imputation methods exist to handle the missing data problem for databases, when applied to concept drifting data streams, these methods share some common difficulties. First, due to large and continuous data volumes, we are unable to maintain all stream records to form a candidate pool for missing value estimation, as most existing methods commonly do. Second, even if we could maintain all complete stream records using a summary structure, the concept drifting problem would make some information obsolete, and thus deteriorate the imputation accuracy. Third, in data streams, it is necessary to develop a fast yet accurate algorithm to find most similar data for imputation. Fourth, due to dynamic and sophisticated data collection environments, the missing rate of most stream data may be much higher than that in databases, so the imputation method should be able to handle high missing rate in the data. To tackle these challenges, we propose a Streaming k-Nearest-Neighbors Imputation Framework (SKIF) for concept drifting data streams. To handle concept drifting and large volume problems in data streams, SKIF first summarizes historical complete records in some micro-resources (which are high-level statistical data structures), and maintains these micro-resources in a candidate pool as benchmark data. After that, SKIF employs a novel hybrid-kNN imputation procedure, which uses a hybrid similarity search mechanism, to find the most similar micro-resources from the large scale candidate pool efficiently. Experimental results demonstrate the effectiveness of the proposed SKIF framework for data stream imputation tasks.

机译：许多应用程序中常见的数据缺失。虽然存在许多数据归纳方法来处理数据库的缺失数据问题，但在应用于概念漂移数据流时，这些方法共享一些常见的困难。首先，由于大量和连续的数据卷，我们无法维护所有流记录以形成候选池以缺少值估计，因为大多数现有方法通常是这样做的。其次，即使我们可以使用摘要结构维护所有完整的流记录，概念漂移问题会使一些信息过时，从而降低了归属精度。第三，在数据流中，有必要开发一种快速且准确的算法，以找到估算的大多数数据。第四，由于动态和复杂的数据收集环境，大多数流数据的缺失率可能远高于数据库中的速率，因此归责方法应该能够处理数据中的高缺失率。为了解决这些挑战，我们提出了一种媒体K-Indection-邻居归责框架（SKIF），用于概念漂移数据流。为了处理数据流中的概念漂移和大体积问题，SKIF首先总结了一些微资源（这是高级统计数据结构）的历史完整记录，并将这些微资源维护在候选池中作为基准数据。之后，SKIF采用一种新颖的混合核对归责程序，它使用混合相似性搜索机制，以有效地从大规模候选池中找到最相似的微资源。实验结果表明了建议的SKIF框架用于数据流归档任务的有效性。

著录项

来源
《ACM conference on information and knowledge management》|2010年||共4页
会议地点
作者
Peng Zhang; Xingquan Zhu; Jianlong Tan; Li Guo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
data streams; data imputation; concept drifting;

机译：数据流;数据归档;概念漂移;

相似文献

外文文献
中文文献
专利

1. Dynamic Clustering Forest: An ensemble framework to efficiently classify textual data stream with concept drift [J] . Song Ge, Ye Yunming, Zhang Haijun, Information Sciences: An International Journal . 2016,第Null期

机译：动态集群林：一种通过概念漂移有效地对文本数据流进行分类的整体框架
2. A grid density based framework for classifying streaming data in the presence of concept drift [J] . Sethi Tegjyot Singh, Kantardzic Mehmed, Hu Hanquing Journal of Intelligent Information Systems . 2016,第1期

机译：基于网格密度的框架，用于在存在概念漂移的情况下对流数据进行分类
3. Concept Drift Detection in Data Stream Clustering and its Application on Weather Data [J] . International Journal of Agricultural and Environmental Information Systems . 2020,第1期

机译：概念漂移检测数据流聚类及其在天气数据上的应用
4. SKIF: A Data Imputation Framework for Concept Drifting Data Streams [C] . Peng Zhang, Xingquan Zhu, Jianlong Tan, CIKM 10;ACM conference on information and knowledge management . 2011

机译：SKIF：用于概念漂移数据流的数据归因框架
5. The GC3 framework grid density based clustering for classification of streaming data with concept drift. [D] . Sethi, Tegjyot Singh. 2013

机译：基于GC3框架网格密度的聚类，用于通过概念漂移对流数据进行分类。
6. Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance [O] . Yange Sun, Meng Li, Lei Li, 2021

机译：具有与概念漂移和类不平衡的演化数据流的成本敏感分类
7. A general framework for mining concept-drifting data streams with skewed distributions [O] . Jing Gao, Wei Fan, Jiawei Han, 2007

机译：挖掘具有偏差分布的概念漂移数据流的一般框架

SKIF: A Data Imputation Framework for Concept Drifting Data Streams

摘要

著录项

相似文献

相关主题

期刊订阅