首页> 外文期刊>Journal of information and computational science >A Parallel Oversampling Algorithm Based on NRSBoundary-SMOTE
【24h】

A Parallel Oversampling Algorithm Based on NRSBoundary-SMOTE

机译:基于NRSBoundary-SMOTE的并行过采样算法

获取原文
获取原文并翻译 | 示例

摘要

Classification with imbalanced class distribution data set has always been regarded as a difficult problem in knowledge discovery, and re-sampling is an effective way to deal with imbalanced data set. SMOTE is a widely used over-sampling algorithm, however it does not consider the distribution of the original data while generates new synthetic samples. NRSBoundary-SMOTE is based on Neighborhood RoughSet Theory. It only chooses the minority class samples, which belong to the boundary region, to generate synthetic samples, and it can improve the accuracy of the minority class effectively. But it needs to compute the distance of any two samples when parting the data set. That process will take a long time on large data sets. So we propose a parallel over-sampling method based on NRSBoundary-SMOTE, Parallel-NRSBoundary-SMOTE. And we apply our method by mapreduce programming paradigm. The experimental results that running on hadoop clusters shows that our method can maintain the accuracy of minority class and improve the efficiency on large data sets.
机译:带有不平衡类分布数据集的分类一直被认为是知识发现中的难题,而重新采样是处理不平衡数据集的有效方法。 SMOTE是一种广泛使用的过采样算法,但是在生成新的合成样本时,它不会考虑原始数据的分布。 NRSBoundary-SMOTE基于邻域粗糙集理论。只选择属于边界区域的少数类样本来生成合成样本,可以有效地提高少数类的准确性。但是,在分割数据集时,需要计算任意两个样本的距离。对于大型数据集,该过程将花费很长时间。因此,我们提出了一种基于NRSBoundary-SMOTE,Parallel-NRSBoundary-SMOTE的并行过采样方法。然后我们通过mapreduce编程范例应用我们的方法。在Hadoop集群上运行的实验结果表明,我们的方法可以保持少数类的准确性,并提高大数据集的效率。

著录项

  • 来源
    《Journal of information and computational science》 |2014年第13期|4655-4665|共11页
  • 作者单位

    Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;

    Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;

    Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;

    Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Imbalanced Data; Neighborhood RoughSet; NRSBoundary-SMOTE; MapReduce; Parallel;

    机译:数据不平衡;邻域粗糙集;NRSBoundary-SMOTE;MapReduce;平行;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号