首页> 外文期刊>Expert systems with applications >A comparison of classification methods across different data complexity scenarios and datasets
【24h】

A comparison of classification methods across different data complexity scenarios and datasets

机译:不同数据复杂性方案和数据集中分类方法的比较

获取原文
获取原文并翻译 | 示例

摘要

Recent research assessed the performance of classification methods mainly on concrete datasets whose statistical characteristics are unknown or unreported. The performance furthermore is often determined by only one performance measure, such as the area under the receiver operating characteristic curve. The performance of several classification methods in four different complexity scenarios and on datasets described by five data characteristics is compared in this paper. Synthetical datasets are used to control their statistical characteristics and real datasets are used to verify our findings. The performance of each classification method is determined by six measures. The investigation reveals that heterogeneous classifiers perform best on average, bagged CART is especially recommendable for datasets with low dimensionality and high sample size, kernel-based classification methods perform very well especially with a polynomial kernel, but require a rather long time for training and a nearest shrunken neighbor classifier is recommendable in case of unbalanced datasets. These findings help researchers and practitioners finding an appropriate method for their binary classification problems.
机译:最近的研究评估了主要在统计特征未知或未报告的混凝土数据集上的分类方法的性能。此外的性能通常仅通过一个性能测量来确定,例如接收器操作特性曲线下的区域。在本文中比较了在四种不同复杂性场景和五个数据特征中描述的数据集中的若干分类方法的性能。合成数据集用于控制其统计特征,并且使用实际数据集来验证我们的研究结果。每个分类方法的性能由六项措施决定。该研究表明,异构分类器平均执行最佳,袋装推车对于具有低维度和高样本大小的数据集特别推荐,基于内核的分类方法特别适用于多项式内核,但需要相当长的时间进行培训和一个在不平衡数据集的情况下,最近的缩小邻居分类器是可推荐的。这些发现有助于研究人员和从业者寻找适当的方法,以获得其二进制分类问题。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号