首页> 外文会议>Pattern recognition in bioinformatics >Ensemble Logistic Regression for Feature Selection
【24h】

Ensemble Logistic Regression for Feature Selection

机译:集成Logistic回归进行特征选择

获取原文
获取原文并翻译 | 示例

摘要

This paper describes a novel feature selection algorithm embedded into logistic regression. It specifically addresses high dimensional data with few observations, which are commonly found in the biomedi-cal domain such as microarray data. The overall objective is to optimize the predictive performance of a classifier while favoring also sparse and stable models. Feature relevance is first estimated according to a simple t-test ranking. This initial feature relevance is treated as a feature sampling probability and a multivariate logistic regression is iteratively reestimated on subsets of randomly and non-uniformly sampled features. At each iteration, the feature sampling probability is adapted according to the predictive performance and the weights of the logistic regression. Globally, the proposed selection method can be seen as an ensemble of logistic regression models voting jointly for the final relevance of features. Practical experiments reported on several microarray datasets show that the proposed method offers a comparable or better stability and significantly better predictive performances than logistic regression regularized with Elastic Net. It also outperforms a selection based on Random Forests, another popular embedded feature selection from an ensemble of classifiers.
机译:本文介绍了一种嵌入到逻辑回归中的新颖特征选择算法。它专门解决了几乎没有观察到的高维数据,这些观察通常在生物医学领域中发现,例如微阵列数据。总体目标是优化分类器的预测性能,同时也支持稀疏和稳定的模型。首先根据简单的t检验排名来估计特征相关性。将该初始特征相关性视为特征采样概率,并对随机和非均匀采样特征的子集迭代重新估计多元逻辑回归。在每次迭代中,根据预测性能和逻辑回归的权重调整特征采样概率。在全球范围内,提出的选择方法可以看作是逻辑回归模型的整体,它们共同投票支持要素的最终相关性。在一些微阵列数据集上进行的实践实验表明,与使用Elastic Net正则化的逻辑回归相比,所提出的方法提供了可比或更好的稳定性,并且预测性能显着提高。它也胜过基于随机森林的选择,后者是来自分类器集合的另一种流行的嵌入式特征选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号