An Information-Theoretic Approach for Setting the Optimal Number of Decision Trees in Random Forests

机译：一种信息 - 理论方法，用于在随机林中设置决策树的最佳数量

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data Classification is a process within the Data Mining and Machine Learning field which aims at annotating all instances of a dataset by so-called class labels. This involves in creating a model from a training set of data instances which are already labeled, possibly being this model also used to define the class of data instances which are not classified already. A successful way of performing the classification process is provided by the algorithm Random Forests (RF), which is itself a type of Ensemble-based Classifier. An ensemble-based classifier increases the accuracy of the class label assigned to a data instance by using a set of classifiers that are modeled on different, but possibly overlapping, instance sets, and then combining the so-obtained intermediate classification results. To this end, RF particularly makes use of a number of decision trees to classify an instance, then taking the majority of votes from these trees as the final classifier. The latter one is a critical task of algorithm RF, which heavily impacts on the accuracy of the final classifier. In this paper, we propose a variation of algorithm RF, namely adjusting one of the two parameters that RF takes, the number of decision trees, dependant on a meaningful relation between the dataset predictive power rating and the number of trees itself, with the goal of improving accuracy and performance of the algorithm. This is finally demonstrated by our comprehensive experimental evaluation on several clean datasets.

机译：数据分类是数据挖掘和机器学习领域内的一种方法，其目的是通过所谓的类标签标注的数据集的所有实例。这涉及到从训练数据集的实例，其已经被标记，可能是这种模式也用于定义类尚未分类数据实例的创建模型。在进行分类处理的一个成功的方法是由算法随机森林（RF），这本身是一个类型的基于集合的分类器的提供。基于合奏分类器通过使用一组在不同的建模分类器，但可能重叠，实例集，然后合并如此获得的中间分类结果增加分配给数据实例的类标签的准确度。为此，RF尤其是利用一些决策树的一个实例进行分类，然后采取多数票这些树作为最终的分类。后者是算法RF，这对最终分类器的精确度严重影响的关键任务。在本文中，我们提出的算法RF的变化，即调整两个参数的一个RF需要，决策树的数量，取决于数据集的预测额定功率和树木本身的数量之间的有意义的关系，与目标的提高算法的精度和性能。这是最后我们综合实验评价，结果证实在几个干净的数据集。

著录项

来源
《IEEE International Conference on Systems, Man, and Cybernetics》|2013年||共7页
会议地点
作者
Alfredo Cuzzocrea; Shane Leo Francis; Mohamed Medhat Gaber;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP13-53;
关键词
Random Forests; Data Mining; Data Classification; Predictive Power; Information Gain; Ensemble Classification;

机译：随机森林;数据挖掘;数据分类;预测力;信息增益;集合分类;

相似文献

外文文献
中文文献
专利

1. Urbanization Analysis Using Spatial Support and Improved Random forest Decision Tree Approach [J] . P. Kalyani, P. Govindarajulu International journal of computer science and network security . 2017,第4期

机译：基于空间支持和改进的随机森林决策树方法的城市化分析
2. Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology [J] . Mistry Pritesh, Neagu Daniel, Trundle Paul R., Soft computing: A fusion of foundations, methodologies and applications . 2016,第8期

机译：在计算毒理学中使用随机森林和决策树模型作为新的车辆预测方法
3. Modeling flood susceptibility using data-driven approaches of naive Bayes tree, alternating decision tree, and random forest methods [J] . Chen Wei, Li Yang, Xue Weifeng, The Science of the Total Environment . 2020,第Jana20期

机译：使用朴素贝叶斯树，交替决策树和随机森林方法的数据驱动方法对洪水敏感性进行建模
4. An Information-Theoretic Approach for Setting the Optimal Number of Decision Trees in Random Forests [C] . Alfredo Cuzzocrea, Shane Leo Francis, Mohamed Medhat Gaber IEEE International Conference on Systems, Man, and Cybernetics . 2013

机译：一种信息 - 理论方法，用于在随机林中设置决策树的最佳数量
5. Predicting Credit Union Customer Churn Behavior Using Decision Trees, Logistic Regression, and Random Forest Models [D] . Barr, Frederick. 2020

机译：使用决策树，逻辑回归和随机林模型预测信用联盟客户流失行为
6. Sensopeptidomic Kinetic Approach Combined with Decision Trees and Random Forests to Study the Bitterness during Enzymatic Hydrolysis Kinetics of Micellar Caseins [O] . Dahlia Daher, Barbara Deracinois, Philippe Courcoux, 2021

机译：敏化体动力学方法与决策树木和随机森林相结合以研究胶束酪蛋白酶水解动力学期间的苦味
7. Sensopeptidomic Kinetic Approach Combined with Decision Trees and Random Forests to Study the Bitterness during Enzymatic Hydrolysis Kinetics of Micellar Caseins [O] . Dahlia Daher, Barbara Deracinois, Philippe Courcoux, 2021

机译：敏化体动力学方法与决策树木和随机森林相结合，以研究胶束酪蛋白酶水解动力学期间的苦味

An Information-Theoretic Approach for Setting the Optimal Number of Decision Trees in Random Forests

摘要

著录项

相似文献

相关主题

期刊订阅