首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >E-Tree: An Efficient Indexing Structure for Ensemble Models on Data Streams
【24h】

E-Tree: An Efficient Indexing Structure for Ensemble Models on Data Streams

机译:电子树:数据流中集成模型的高效索引结构

获取原文
获取原文并翻译 | 示例

摘要

Ensemble learning is a common tool for data stream classification, mainly because of its inherent advantages of handling large volumes of stream data and concept drifting. Previous studies, to date, have been primarily focused on building accurate ensemble models from stream data. However, a linear scan of a large number of base classifiers in the ensemble during prediction incurs significant costs in response time, preventing ensemble learning from being practical for many real-world time-critical data stream applications, such as Web traffic stream monitoring, spam detection, and intrusion detection. In these applications, data streams usually arrive at a speed of GB/second, and it is necessary to classify each stream record in a timely manner. To address this problem, we propose a novel Ensemble-tree (E-tree for short) indexing structure to organize all base classifiers in an ensemble for fast prediction. On one hand, E-trees treat ensembles as spatial databases and employ an R-tree like height-balanced structure to reduce the expected prediction time from linear to sub-linear complexity. On the other hand, E-trees can be automatically updated by continuously integrating new classifiers and discarding outdated ones, well adapting to new trends and patterns underneath data streams. Theoretical analysis and empirical studies on both synthetic and real-world data streams demonstrate the performance of our approach.
机译:集成学习是数据流分类的常用工具,主要是因为它具有处理大量流数据和概念漂移的固有优势。迄今为止,以前的研究主要集中在根据流数据构建准确的集成模型。但是,在预测过程中对集合中的大量基本分类器进行线性扫描会导致响应时间花费大量成本,从而使集合学习无法用于许多实时时间要求严格的数据流应用程序,例如Web流量监视,垃圾邮件检测和入侵检测。在这些应用程序中,数据流通常以GB /秒的速度到达,因此必须及时对每个流记录进行分类。为了解决此问题,我们提出了一种新颖的集成树(简称E树)索引结构,以将所有基本分类器组织为一个整体进行快速预测。一方面,E树将集合视为空间数据库,并采用类似R树的高度平衡结构,以将预期的预测时间从线性复杂度降低到亚线性。另一方面,通过不断集成新的分类器并丢弃过时的分类器,可以很好地适应数据流下的新趋势和新模式,从而自动更新电子树。对综合数据流和实际数据流的理论分析和实证研究都证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号