【24h】

How to Interpret Decision Trees?

机译:如何解释决策树?

获取原文
获取原文并翻译 | 示例

摘要

Data mining methods are widely used across many disciplines to identify patterns, rules or associations among huge volumes of data. While in the past mostly black box methods such as neural nets and support vector machines have been heavily used in technical domains, methods that have explanation capability are preferred in medical domains. Nowadays, data mining methods with explanation capability are also used for technical domains after more work on advantages and disadvantages of the methods has been done. Decision tree induction such as C4.5 is the most preferred method since it works well on average regardless of the data set being used. This method can easily learn a decision tree without heavy user interaction while in neural nets a lot of time is spent on training the net. Cross-validation methods can be applied to decision tree induction methods; these methods ensure that the calculated error rate comes close to the true error rate. The error rate and the particular goodness measures described in this paper are quantitative measures that provide help in understanding the quality of the model. The data collection problem with its noise problem has to be considered. Specialized accuracy measures and proper visualization methods help to understand this problem. Since decision tree induction is a supervised method, the associated data labels constitute another problem. Re-labeling should be considered after the model has been learnt. This paper also discusses how to fit the learnt model to the expert's knowledge. The problem of comparing two decision trees in accordance with its explanation power is discussed. Finally, we summarize our methodology on interpretation of decision trees.
机译:数据挖掘方法已在许多学科中广泛使用,以识别大量数据之间的模式,规则或关联。过去,在技术领域中大量使用了黑匣子方法,例如神经网络和支持向量机,而在医学领域中,首选具有解释能力的方法。如今,在完成了关于方法的优缺点的更多工作之后,具有解释能力的数据挖掘方法也被用于技术领域。诸如C4.5之类的决策树归纳方法是最优选的方法,因为无论使用什么数据集,它的平均效果都很好。这种方法可以轻松地学习决策树,而无需大量的用户交互,而在神经网络中,则需要花费大量时间来训练网络。交叉验证方法可以应用于决策树归纳方法;这些方法可确保计算出的错误率接近真实错误率。本文所述的错误率和特定的优度度量是定量度量,可帮助您理解模型的质量。必须考虑数据收集问题及其噪声问题。专门的准确性度量和正确的可视化方法有助于理解此问题。由于决策树归纳是一种受监督的方法,因此相关的数据标签构成了另一个问题。在学习模型之后,应考虑重新标记。本文还讨论了如何使学习的模型适合专家的知识。讨论了根据其解释能力比较两个决策树的问题。最后,我们总结了决策树解释的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号