首页> 外文会议>International Conference on Rough Sets and Current Trends in Computing(RSCTC 2006); 20061106-08; Kobe(JP) >A Statistical Method for Determining Importance of Variables in an Information System
【24h】

A Statistical Method for Determining Importance of Variables in an Information System

机译:确定信息系统中变量重要性的统计方法

获取原文
获取原文并翻译 | 示例

摘要

A new method for estimation of attributes' importance for supervised classification, based on the random forest approach, is presented. Essentially, an iterative scheme is applied, with each step consisting of several runs of the random forest program. Each run is performed on a suitably modified data set: values of each attribute found unimportant at earlier steps are randomly permuted between objects. At each step, apparent importance of an attribute is calculated and the attribute is declared unimportant if its importance is not uniformly better than that of the attributes earlier found unimportant. The procedure is repeated until only attributes scoring better than the randomized ones are retained. Statistical significance of the results so obtained is verified. This method has been applied to 12 data sets of biological origin. The method was shown to be more reliable than that based on standard application of a random forest to assess attributes' importance.
机译:提出了一种基于随机森林方法的属性估计对监督分类的重要性的新方法。本质上,应用了一种迭代方案,每个步骤都由随机森林程序的几次运行组成。每次运行都在经过适当修改的数据集上执行:在较早的步骤中发现不重要的每个属性的值在对象之间随机排列。在每个步骤中,都会计算属性的明显重要性,如果该属性的重要性并没有比先前发现的不重要的属性一致地更好,则宣布该属性不重要。重复该过程,直到仅保留得分高于随机属性的属性为止。验证了如此获得的结果的统计意义。此方法已应用于12个生物学来源的数据集。结果表明,该方法比基于随机森林标准评估属性重要性的方法更为可靠。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号