首页> 外文期刊>Research journal of applied science, engineering and technology >A Novel Ensemble Classifier based Classification on Large Datasets with Hybrid Feature Selection Approach
【24h】

A Novel Ensemble Classifier based Classification on Large Datasets with Hybrid Feature Selection Approach

机译:基于混合特征选择方法的大数据集分类器

获取原文
           

摘要

Exploring and analyzing large datasets has become an active research area in the field of data mining in the last two decades. There had been several approaches available in the literature to investigate the large datasets that comprise of millions of data. The most important data mining approaches involved in this task are preprocessing, feature selection and classification. All the three approaches have their own importance in carrying out the task effectively. Most of the existing techniques suffer from drawbacks of high complexity and computationally costly on large data sets. Especially, the classification techniques do not provide consistent and reliable results for large datasets which makes the existing classification systems inefficient and unreliable. This study mainly focuses on develop a novel and efficient framework for analyzing and classifying a large dataset. This study proposes a novel classification approach on large datasets through the process of ensemble classification. Initially, efficient preprocessing approach based on enhanced KNN and feature selection based on genetic algorithm integrated with Kernal PCA are carried out which selects a subset of informative attributes or variables to construct models relating data. Then, Classification is carried on the selected features based on the ensemble approach to get accurate results. This research study presents two types of ensemble classifiers called homogenous and heterogeneous ensemble classifiers to evaluate the performance of the proposed system. Experimental results shows that the proposed approach provide significant results for various large datasets.
机译:在过去的二十年中,探索和分析大型数据集已成为数据挖掘领域的活跃研究领域。文献中有几种方法可用来研究包含数百万数据的大型数据集。这项任务涉及的最重要的数据挖掘方法是预处理,特征选择和分类。这三种方法在有效执行任务中都有其重要性。大多数现有技术都具有高复杂性的缺点,并且在大型数据集上的计算成本很高。特别是,分类技术不能为大型数据集提供一致和可靠的结果,这使现有分类系统效率低下且不可靠。这项研究主要侧重于开发一种新颖有效的框架来分析和分类大型数据集。这项研究通过集成分类的过程提出了一种针对大型数据集的新颖分类方法。最初,执行了基于增强型KNN的高效预处理方法以及基于与Kernal PCA集成的遗传算法的特征选择,该方法选择了信息性属性或变量的子集来构建与数据相关的模型。然后,基于集成方法对所选要素进行分类,以获得准确的结果。本研究提出了两种类型的集成分类器,分别称为同质和异类集成分类器,以评估所提出系统的性能。实验结果表明,该方法为各种大型数据集提供了可观的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号