首页> 外文会议>International Conference on Smart Systems and Data Science >A survey of methods and tools used for interpreting Random Forest
【24h】

A survey of methods and tools used for interpreting Random Forest

机译:解释随机森林的方法和工具的调查

获取原文

摘要

Interpretability of highly performant Machine Learning [ML] methods, such as Random Forest [RF], is a key tool that attracts a great interest in datamining research. In the state of the art, RF is well-known as an efficient ensemble learning (in terms of predictive accuracy, flexibility and straightforwardness). Moreover, it is recognized as an intuitive and intelligible approach regarding to its building process. However it is also regarded as a Black Box model because of its hundreds of deep decision trees. This can be crucial for several fields of study, such as healthcare, biology and security, where the lack of interpretability could be a real disadvantage. Indeed, the interpretability of the RF models is, generally, necessary in such fields of applications because of different motivations. In fact, the more the ML users grasp what is going on inside a ML system (process and resulting model), the more they can trust it and take actions based on the knowledge extracted from it. Furthermore, ML models are increasingly constrained by new laws that require regulation and interpretation of the knowledge they provide.Several papers have tackled the interpretation of RF resulting models. It had been associated with different aspects depending on the specificity of the issue studied as well as the users concerned with explanations. Therefore, this paper aims to provide a survey of tools and methods used in literature in order to uncover insights in the RF resulting models. These tools are classified depending on different aspects characterizing the interpretability. This should guide, in practice, in the choice of the most useful tools for interpretation and deep analysis of the RF model depending on the interpretability aspect sought. This should also be valuable for researchers who aim to focus their work on the interpretability of RF, or ML in general.
机译:高性能机器学习[ML]方法(例如随机森林[RF])的可解释性是在数据挖掘研究中引起极大兴趣的关键工具。在现有技术中,RF是众所周知的一种有效的集成学习(就预测准确性,灵活性和直接性而言)。而且,对于其构建过程,它被认为是一种直观且可理解的方法。但是,由于它具有数百个深层决策树,因此也被视为黑匣子模型。这对于医疗,生物学和安全性等多个研究领域可能至关重要,因为缺乏可解释性可能是真正的劣势。实际上,由于动机不同,通常在此类应用领域中,RF模型的可解释性是必需的。实际上,ML用户越了解ML系统(过程和结果模型)中正在发生的事情,他们就越会信任它,并根据从中提取的知识采取行动。此外,机器学习模型越来越受到新法律的束缚,这些新法律要求对它们提供的知识进行监管和解释。根据所研究问题的特殊性以及与解释有关的用户,它与不同方面相关联。因此,本文旨在提供文献中使用的工具和方法的调查,以发现RF结果模型中的见解。这些工具根据表征可解释性的不同方面进行了分类。在实践中,这应根据所寻求的可解释性方面指导选择最有用的工具来解释和深入分析RF模型。对于希望将工作重点放在RF或ML的可解释性上的研究人员来说,这也将是有价值的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号