首页> 外文会议>IEEE International Conference of Intelligent Applied Systems on Engineering >A Distributed Intelligent Algorithm Applied to Imbalanced Data
【24h】

A Distributed Intelligent Algorithm Applied to Imbalanced Data

机译:应用于不平衡数据的分布式智能算法

获取原文

摘要

Data mining means to find valuable information in database or data sets. For imbalanced data, there are extremely low number of samples in database or data sets and it is not easy to solve these problems by traditional methods of data mining. In this paper, a distributed intelligent algorithm is proposed to imbalanced data. Apache Spark is implemented as the distributed framework in the proposed distributed intelligent algorithm, and its cluster computing framework with in-memory data processing engine can do analytic on large volumes of data. In the distributed framework, Apache Spark with synthetic minority oversampling technique (SMOTE) is proposed to process imbalanced data first. Thereafter, the support vector machine (SVM) is used to classify imbalanced data. The zoo data set from UCI repository is used to verify the correctness of the proposed algorithm. The results of the proposed distributed intelligent algorithm can get better performance than these compared traditional classifiers.
机译:数据挖掘意味着在数据库或数据集中找到有价值的信息。对于数据数据库或数据集中存在极低的样本数量,通过传统的数据挖掘方法,不容易解决这些问题。在本文中,提出了一种分布式智能算法对数据的不平衡。 Apache Spark是在所提出的分布式智能算法中实现的分布式框架,其集群计算框架与内存数据处理引擎可以在大量数据上进行分析。在分布式框架中,提出了具有合成少数群体过采样技术(SMOTE)的Apache Spark,首先处理不平衡数据。此后,支持向量机(SVM)用于对不平衡数据进行分类。来自UCI存储库的动物园数据集用于验证所提出的算法的正确性。所提出的分布式智能算法的结果可以获得比这些比较的传统分类器更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号