首页> 外文期刊>Knowledge and information systems >An automatic extraction method of the domains of competence for learning classifiers using data complexity measures
【24h】

An automatic extraction method of the domains of competence for learning classifiers using data complexity measures

机译:使用数据复杂性度量的学习分类器胜任力领域的自动提取方法

获取原文
获取原文并翻译 | 示例
       

摘要

The constant appearance of algorithms and problems in data mining makes impossible to know in advance whether the model will perform well or poorly until it is applied, which can be costly. It would be useful to have a procedure that indicates, prior to the application of the learning algorithm and without needing a comparison with other methods, whether the outcome will be good or bad using the information available in the data. In this work, we present an automatic extraction method to determine the domains of competence of a classifier using a set of data complexity measures proposed for the task of classification. These domains codify the characteristics of the problems that are suitable or not for it, relating the concepts of data geometrical structures that may be difficult and the final accuracy obtained by any classifier. In order to do so, this proposal uses 12 metrics of data complexity acting over a large benchmark of datasets in order to analyze the behavior patterns of the method, obtaining intervals of data complexity measures with good or bad performance. As a representative for classifiers to analyze the proposal, three classical but different algorithms are used: C4.5, SVM and K-NN. From these intervals, two simple rules that describe the good or bad behaviors of the classifiers mentioned each are obtained, allowing the user to characterize the response quality of the methods from a dataset's complexity. These two rules have been validated using fresh problems, showing that they are general and accurate. Thus, it can be established when the classifier will perform well or poorly prior to its application.
机译:算法的不断出现和数据挖掘中的问题使得无法事先知道该模型在应用之前是否会表现良好或较差,这可能会导致成本高昂。在应用学习算法之前,无需使用其他方法进行比较即可使用数据中可用的信息指示结果是好是坏,这将是很有用的。在这项工作中,我们提出了一种自动提取方法,该方法使用为分类任务建议的一组数据复杂性度量来确定分类器的能力范围。这些领域将适合或不适合它的问题的特征进行了整理,与可能难以实现的数据几何结构的概念以及任何分类器获得的最终准确性相关。为了做到这一点,该提议使用了在较大的数据集基准上起作用的12个数据复杂性度量,以便分析该方法的行为模式,获得具有好坏性能的数据复杂性度量的间隔。作为分类器分析提案的代表,使用了三种经典但不同的算法:C4.5,SVM和K-NN。从这些时间间隔中,获得了两个简单的规则,分别描述了所提到的分类器的好坏行为,从而使用户可以根据数据集的复杂性来表征方法的响应质量。这两个规则已使用新问题进行了验证,表明它们是通用且准确的。因此,可以确定何时分类器在其应用之前将表现良好或不良。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号