首页> 美国卫生研究院文献>Genes >Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
【2h】

Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE

机译:自动确定RF-RFE中最佳特征子集的决策变量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE.
机译:特征选择从原始特征空间中识别出一组最有用的特征,已被广泛用于简化预测器。作为最流行的特征选择方法之一,递归特征消除(RFE)可有效减少数据维数和提高效率。通过RFE可以对特征以及具有相应精度的候选子集​​进行排名。精度最高(HA)或预设数量的特征(PreNum)的子集通常用作最终子集。但是,这可能会导致选择大量特征,或者,如果没有关于此预设数量的先验知识,则关于最终子集选择通常是模棱两可和主观的。迫切需要适当的决策变量来自动确定最佳子集。在这项研究中,我们从RFE获得了候选子集列表之后,进行了开拓性工作来探索决策变体。我们提供了几种决策变量的详细分析和比较,以自动选择最佳特征子集。介绍了随机森林递归特征消除算法和投票策略。我们在两个完全不同的分子生物学数据集上验证了这些变体,一个用于毒理基因组研究,另一个用于蛋白质序列分析。该研究提供了一种使用RF-RFE时确定最佳特征子集的自动化方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号