【24h】

Classification for Free

机译:免费分类

获取原文
获取原文并翻译 | 示例

摘要

Currently, data classification is either performed on data stored in relational databases or performed on data stored in small flat files. The problem of these approaches is that for large data sets, they often need multiple scans of the original data and thus are often infeasible in many real applications. In this paper we propose to deploy classification on top of OLAP (online analytical processing) and data cube systems. First, we compute the statistics in various combinations of the attributes also known as data cubes. The statistics are then used to derive the classification model. In this way, we only scan the original data once, which improves the performance of classification significantly. Furthermore, since in the decision support systems data cubes are usually already pre-computed for answering OLAP queries, our new classifier will provide "free" classification functions by eliminating the dominating I/O overhead of scanning the original data.
机译:当前,数据分类是对关系数据库中存储的数据执行,还是对小型平面文件中存储的数据执行。这些方法的问题在于,对于大型数据集,它们通常需要对原始数据进行多次扫描,因此在许多实际应用中通常不可行。在本文中,我们建议在OLAP(在线分析处理)和数据多维数据集系统之上部署分类。首先,我们以属性(也称为数据立方体)的各种组合来计算统计信息。然后将统计信息用于导出分类模型。这样,我们只扫描原始数据一次,从而大大提高了分类性能。此外,由于在决策支持系统中通常已经预先计算了数据立方体以回答OLAP查询,因此我们的新分类器将消除扫描原始数据的主要I / O开销,从而提供“免费”分类功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号