【24h】

Mining Complex Models from Arbitrarily Large Databases in Constant Time

机译:在恒定时间内从任意大型数据库中挖掘复杂模型

获取原文
获取外文期刊封面目录资料

摘要

In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm is that its running time becomes independent of the size of the database, while the decisions made are essentially identical to those that would be made given infinite data. The method works within pre-specified memory limits and, as long as the data is iid, only requires accessing it sequentially. It gives anytime results, and can be used to produce batch, stream, time-changing and active-learning versions of an algorithm. We apply the method to learning Bayesian networks, developing an algorithm that is faster than previous ones by orders of magnitude, while achieving essentially the same predictive performance. We observe these gains on a series of large databases generated from benchmark networks, on the KDD Cup 2000 e-commerce data, and on a Web log containing 100 million requests.
机译:在本文中,我们提出了一种放大方法,该方法基本上适用于任何基于离散搜索的归纳算法。将该方法应用于算法的结果是,其运行时间变得与数据库的大小无关,而所做出的决定与给定无限数据时所做出的决定本质上相同。该方法在预先指定的内存限制内工作,并且只要对数据进行识别,只需要顺序访问即可。它可以随时提供结果,并可用于生成算法的批处理,流,时变和主动学习版本。我们将该方法应用于学习贝叶斯网络,开发了一种比以前的算法快几个数量级的算法,同时实现了基本相同的预测性能。我们从基准网络生成的一系列大型数据库,KDD Cup 2000电子商务数据以及包含1亿个请求的Web日志中观察到了这些收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号