...
首页> 外文期刊>European Journal of Operational Research >Using simulated annealing to optimize the feature selection problem in marketing applications
【24h】

Using simulated annealing to optimize the feature selection problem in marketing applications

机译:使用模拟退火优化营销应用程序中的特征选择问题

获取原文
获取原文并翻译 | 示例

摘要

The feature selection (also, specification) problem is concerned with finding the most influential subset of predictors in predictive modeling from a much larger set of potential predictors that can contain hundreds of predictors. The problem belongs to the realm of combinatorial optimization where the objective is to find the subset of variables that optimize the value of some goodness of fit function. Due to the dimensionality of the problem, the feature selection problem belongs to the group of NP-hard problems. Most of the available predictors are noisy or redundant and add very little, if any, to the prediction power of the model. Using all the predictors in the model often results in strong over-fitting and very poor predictions. Constructing a prediction model by checking out all possible subsets is impractical due to computational volume. Looking on the contribution of each predictor separately is not accurate because it ignores the intercorrelations between predictors. As a result, no analytic solution is available for the feature selection problem, requiring that one resorts to heuristics. In this paper we employ the simulated annealing (SA) approach, which is one of the leading stochastic search methods, for specifying a large-scale linear regression model. The SA results are compared to the results of the more common stepwise regression (SWR) approach for model specification. The models are applied on realistic data sets in database marketing. We also use simulated data sets to investigate what data characteristics make the SWR approach equivalent to the supposedly more superior SA approach. (c) 2004 Elsevier B.V. All rights reserved.
机译:特征选择(也是规范)问题涉及从更大的可能包含数百个预测器的潜在预测器集中找到预测模型中影响力最大的预测器子集。问题属于组合优化领域,其目的是找到优化拟合函数优度值的变量子集。由于问题的维度,特征选择问题属于NP-hard问题。大多数可用的预测变量都是嘈杂的或多余的,并且几乎没有增加模型的预测能力。使用模型中的所有预测变量通常会导致过度拟合和非常差的预测。由于计算量大,通过检出所有可能的子集来构建预测模型是不切实际的。单独查看每个预测变量的贡献是不准确的,因为它忽略了预测变量之间的相互关系。结果,没有用于特征选择问题的解析解决方案,要求人们诉诸启发式算法。在本文中,我们采用模拟退火(SA)方法(这是一种领先的随机搜索方法)来指定大型线性回归模型。将SA结果与更常见的逐步回归(SWR)方法进行模型说明的结果进行比较。该模型适用于数据库营销中的实际数据集。我们还使用模拟数据集来研究哪些数据特征使SWR方法等效于所谓的更高级的SA方法。 (c)2004 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号