首页> 美国卫生研究院文献>Annals of Translational Medicine >Variable selection in Logistic regression model with genetic algorithm
【2h】

Variable selection in Logistic regression model with genetic algorithm

机译:基于遗传算法的Logistic回归模型中的变量选择

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
机译:变量或特征选择是模型规范中最重要的步骤之一。特别是在做出医疗决定的情况下,直接使用医疗数据库而没有事先进行分析和预处理的步骤通常会适得其反。以这种方式,变量选择表示从数据库中选择最相关的属性的方法,以建立可靠的学习模型,从而提高决策过程中使用的模型的性能。在生物医学研究中,变量选择的目的是选择具有临床意义和统计学意义的变量,同时排除无关或噪声变量。存在多种用于变量选择的方法,但是没有一种是没有限制的。例如,逐步使用的逐步方法会在每个循环中添加最佳变量,通常会产生一组可接受的变量。然而,它受到通常被局限在局部最优中的事实的限制。最佳子集方法可以系统地搜索整个协变量模式空间,但是解决方案池可能非常庞大,具有数十到数百个变量,在当今的临床数据中就是这种情况。遗传算法(GA)是启发式优化方法,可用于多变量回归模型中的变量选择。本教程文件旨在为在变量选择中使用GA提供循序渐进的方法。文本中提供的R代码可以扩展并适合其他数据分析需求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号