...
首页> 外文期刊>Pattern recognition letters >Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques
【24h】

Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques

机译:使用自举重采样和AdaBoost技术处理具有合成边界数据生成的不平衡数据集

获取原文
获取原文并翻译 | 示例
           

摘要

The problem of imbalanced data between classes prevails in various applications such as bioinformatics. The correctness of prediction in case of imbalanced data is usually biased towards the majority class. However, in several applications, the accuracy of prediction in minority class is also significant as much as in majority class. Previously, there were many techniques proposed to increase the accuracy in minority class. These techniques are based on the concept of re-sampling, which can be over-sampling and under-sampling, during the training process. Those re-sampling techniques did not considered how the data are scattered in the space. In this paper, we proposed a new technique based on the fact that the location of separating function in between any two sub-clusters in different classes is defined only by the boundary data of each sub-duster. In addition, the accuracy is measured only by the testing set. Our technique adapted the concept of bootstrapping to estimate new region of each sub-duster and synthesize the new boundary data. The new region is for coping with the unseen testing data. All new synthesized data were classified by using the concept of AdaBoost algorithm. Our results outperformed the other techniques under several performance evaluating functions.
机译:类之间的数据不平衡的问题在诸如生物信息学的各种应用中普遍存在。在数据不平衡的情况下,预测的正确性通常偏向多数类别。但是,在几种应用中,少数类别的预测准确性也与多数类别一样重要。以前,已经提出了许多技术来提高少数民族班级的准确性。这些技术基于重新采样的概念,在训练过程中可能会过采样和欠采样。这些重新采样技术没有考虑数据如何在空间中分散。在本文中,我们基于以下事实提出了一种新技术:分离函数在不同类别的任何两个子群集之间的位置仅由每个子除尘器的边界数据定义。此外,仅通过测试仪来测量精度。我们的技术采用了自举的概念来估计每个子除尘器的新区域并合成新的边界数据。新区域用于处理看不见的测试数据。所有新合成的数据均使用AdaBoost算法的概念进行分类。在几个性能评估功能下,我们的结果优于其他技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号