...
首页> 外文期刊>Information Sciences: An International Journal >Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid
【24h】

Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid

机译:具有机器学习的回归模型的自动特征工程:进化计算与统计混合

获取原文
获取原文并翻译 | 示例
           

摘要

AbstractSymbolic Regression (SR) is a well-studied task in Evolutionary Computation (EC), where adequate free-form mathematical models must be automatically discovered from observed data. Statisticians, engineers, and general data scientists still prefer traditional regression methods over EC methods because of the solid mathematical foundations, the interpretability of the models, and the lack of randomness, even though such deterministic methods tend to provide lower quality prediction than stochastic EC methods. On the other hand, while EC solutions can be big and uninterpretable, they can be created with less bias, finding high-quality solutions that would be avoided by human researchers. Another interesting possibility is using EC methods to perform automatic feature engineering for a deterministic regression method instead of evolving a single model; this may lead to smaller solutions that can be easy to understand. In this contribution, we evaluate an approach called Kaizen Programming (KP) to develop a hybrid method employing EC and Statistics. While the EC method builds the features, the statistical method efficiently builds the models, which are also used to provide the importance of the features; thus, features are improved over the iterations resulting in better models. Here we examine a large set of benchmark SR problems known from the EC literature. Our experiments show that KP outperforms traditional Genetic Programming - a popular EC method for SR - and also shows improvements over other methods, including other hybrids and well-known statistical and Machine Learning (ML) ones. More in line with ML than EC approaches, KP is able to provide high-quality solutions w
机译:<![cdata [ 抽象 符号回归(SR)是在进化计算(EC)中的一项良好的任务,其中必须自动发现自由形式的数学模型观察到的数据。统计学家,工程师和一般数据科学家仍然优先于EC方法的传统回归方法,因为实体的数学基础,模型的可解释性以及缺乏随机性,即使这种确定性方法倾向于提供比随机EC方法更低的质量预测。另一方面,虽然EC解决方案可以大而无法解释,但它们可以使用较少的偏差来创建,找到人类研究人员将避免的高质量解决方案。另一个有趣的可能性是使用EC方法来执行确定性回归方法的自动特征工程,而不是演变单一模型;这可能导致较小的解决方案可以容易理解。在这一贡献中,我们评估了一种称为Kaizen编程(KP)的方法,以开发采用EC和统计的混合方法。虽然EC方法构建了该功能,但统计方法有效地构建模型,也用于提供特征的重要性;因此,在迭代中提高了特征,导致更好的模型。在这里,我们研究了EC文献中已知的一大集基准SR问题。我们的实验表明,KP优于传统的遗传编程 - 一种流行的SR的EC方法 - 并且还显示出其他方法的改进,包括其他混合动力车和众所周知的统计和机器学习(ML)。更多符合ML的ML比EC方法,KP能够提供高质量的解决方案w

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号