首页> 外文会议>Americas conference on information systems;AMCIS 2005 >Using Optimization-Based Classification Methodfor Massive Datasets
【24h】

Using Optimization-Based Classification Methodfor Massive Datasets

机译:对海量数据集使用基于优化的分类方法

获取原文

摘要

Optimization-based algorithms, such as Multi-Criteria Linear programming (MCLP), have shown their effectiveness inclassification. Nevertheless, due to the limitation of computation power and memory, it is difficult to apply MCLP, or similaroptimization methods, to huge datasets. As the size of today’s databases is continuously increasing, it is highly important thatdata mining algorithms are able to perform their functions regardless of dataset sizes. The objectives of this paper are: (1) topropose a new stratified random sampling and majority-vote ensemble approach, and (2) to compare this approach with theplain MCLP approach (which uses only part of the training set), and See5 (which is a decision-tree-based classification tooldesigned to analyze substantial datasets), on KDD99 and KDD2004 datasets. The results indicate that this new approach notonly has the potential to handle arbitrary-size of datasets, but also outperforms the plain MCLP approach and achievescomparable classification accuracy to See5.
机译:基于优化的算法(例如多准则线性规划(MCLP))已显示出其有效性。然而,由于计算能力和内存的限制,难以将MCLP或类似的优化方法应用于庞大的数据集。随着当今数据库规模的不断增加,无论数据集的大小如何,数据挖掘算法都能执行其功能非常重要。本文的目标是:(1)提出一种新的分层随机抽样和多数表决合奏方法,(2)将该方法与普通MCLP方法(仅使用部分训练集)进行比较,以及See5(其中是一种基于决策树的分类工具,旨在分析KDD99和KDD2004数据集上的大量数据集。结果表明,该新方法不仅具有处理任意大小的数据集的潜力,而且性能优于普通的MCLP方法,并且可以达到与See5相当的分类精度。

著录项

  • 来源
  • 会议地点 Omaha, NE(US);Omaha, NE(US)
  • 作者单位

    Peter Kiewit Institute of Information Science Technology & Engineering University of Nebraska Omaha NE 68182 Phone number: ++1(402)5543429 or ++1(402)5543625 ypeng@mail.unomaha.edu;

    Peter Kiewit Institute of Information Science Technology & Engineering University of Nebraska Omaha NE 68182 Phone number: ++1(402)5543429 or ++1(402)5543625 gkou@mail.unomaha.edu;

    Peter Kiewit Institute of Information Science Technology & Engineering University of Nebraska Omaha NE 68182 Phone number: ++1(402)5543429 or ++1(402)5543625Chinese Academy of Sciences Research Center on Data Mining and Knowledge Management Beijing100039 China Phone number: ++8613651346898 yshi@mail.unomaha.eduyshi@gscas.ac.cn;

    Peter Kiewit Institute of Information Science Technology & Engineering Univ;

  • 会议组织
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

    Classification; Stratified Random Sampling; Majority vote; MCLP;

    机译:分类;分层随机抽样;多数票; MCLP;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号