Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid

Vinícius Veloso de Melo; Wolfgang Banzhaf

首页> 外文期刊>Information Sciences: An International Journal >Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid

【24h】

Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid

机译：具有机器学习的回归模型的自动特征工程：进化计算与统计混合

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Abstract

Symbolic Regression (SR) is a well-studied task in Evolutionary Computation (EC), where adequate free-form mathematical models must be automatically discovered from observed data. Statisticians, engineers, and general data scientists still prefer traditional regression methods over EC methods because of the solid mathematical foundations, the interpretability of the models, and the lack of randomness, even though such deterministic methods tend to provide lower quality prediction than stochastic EC methods. On the other hand, while EC solutions can be big and uninterpretable, they can be created with less bias, finding high-quality solutions that would be avoided by human researchers. Another interesting possibility is using EC methods to perform automatic feature engineering for a deterministic regression method instead of evolving a single model; this may lead to smaller solutions that can be easy to understand. In this contribution, we evaluate an approach called Kaizen Programming (KP) to develop a hybrid method employing EC and Statistics. While the EC method builds the features, the statistical method efficiently builds the models, which are also used to provide the importance of the features; thus, features are improved over the iterations resulting in better models. Here we examine a large set of benchmark SR problems known from the EC literature. Our experiments show that KP outperforms traditional Genetic Programming - a popular EC method for SR - and also shows improvements over other methods, including other hybrids and well-known statistical and Machine Learning (ML) ones. More in line with ML than EC approaches, KP is able to provide high-quality solutions w

机译：<！[cdata [

抽象

符号回归（SR）是在进化计算（EC）中的一项良好的任务，其中必须自动发现自由形式的数学模型观察到的数据。统计学家，工程师和一般数据科学家仍然优先于EC方法的传统回归方法，因为实体的数学基础，模型的可解释性以及缺乏随机性，即使这种确定性方法倾向于提供比随机EC方法更低的质量预测。另一方面，虽然EC解决方案可以大而无法解释，但它们可以使用较少的偏差来创建，找到人类研究人员将避免的高质量解决方案。另一个有趣的可能性是使用EC方法来执行确定性回归方法的自动特征工程，而不是演变单一模型;这可能导致较小的解决方案可以容易理解。在这一贡献中，我们评估了一种称为Kaizen编程（KP）的方法，以开发采用EC和统计的混合方法。虽然EC方法构建了该功能，但统计方法有效地构建模型，也用于提供特征的重要性;因此，在迭代中提高了特征，导致更好的模型。在这里，我们研究了EC文献中已知的一大集基准SR问题。我们的实验表明，KP优于传统的遗传编程 - 一种流行的SR的EC方法 - 并且还显示出其他方法的改进，包括其他混合动力车和众所周知的统计和机器学习（ML）。更多符合ML的ML比EC方法，KP能够提供高质量的解决方案w

著录项

来源
《Information Sciences: An International Journal》 |2018年第2018期|共27页
作者
Vinícius Veloso de Melo; Wolfgang Banzhaf;
展开▼
作者单位

Institute of Science and Technology (ICT) Federal University of S?o Paulo (UNIFESP);

Department of Computer Science and Engineering and BEACON Center for the Study of Evolution in Action Michigan State University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;计算机的应用;信息与知识传播;自动化技术、计算机技术;
关键词
Feature engineering; Machine learning; Symbolic regression; Kaizen programming; Linear regression; Genetic programming; Hybrid;

机译：特征工程;机器学习;象征性回归;Kaizen编程;线性回归;遗传编程;杂交;

相似文献

外文文献
中文文献
专利

1. Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid [J] . Vinícius Veloso de Melo, Wolfgang Banzhaf Information Sciences: An International Journal . 2018,第期

机译：具有机器学习的回归模型的自动特征工程：进化计算与统计混合
2. Regularized machine learning through constraint swarm and evolutionary computation applied to regression problems [J] . Ahmad Mozaffari, Nasser Lashgarian Azad, Alireza Fathi International Journal of Intelligent Computing and Cybernetics . 2014,第4期

机译：通过约束群和进化计算的正则化机器学习应用于回归问题
3. Enhancing Regression Models for Complex Systems Using Evolutionary Techniques for Feature Engineering [J] . Arroba Patricia, Risco-Martin Jos L., Zapater Marina, Journal of grid computing . 2015,第3期

机译：使用演化技术进行特征工程增强复杂系统的回归模型
4. Improving the Statistical Arbitrage Strategy in Intraday Trading by Combining Extreme Learning Machine and Support Vector Regression with Linear Regression Models [C] . Nobrega Jarley Palmeira, Oliveira Adriano Lorena Inacio De International Conference on Tools with Artificial Intelligence . 2013

机译：通过将极限学习机和支持向量回归与线性回归模型相结合，改善日内交易中的统计套利策略
5. Prediction of Electronic Component Prices:from Classical Statistical and Machine Learning Models to Deep Neural Networks with Feature Embedding [D] . Zhang, Yu. 2019

机译：电子零件价格的预测：从经典的统计和机器学习模型到具有特征嵌入的深度神经网络
6. Automatic Craniomaxillofacial Landmark Digitization via Segmentation-guided Partially-joint Regression Forest Model and Multi-scale Statistical Features [O] . Jun Zhang, Yaozong Gao, Li Wang, -1

机译：通过分段引导的部分联合回归森林模型和多尺度统计特征自动进行颅颌面地标数字化
7. Enhancing regression models for complex systems using evolutionary techniques for feature engineering [O] . Arroba García Patricia, Risco Martín José Luis, Zapater Sancho Marina, 2014

机译：使用用于特征工程的进化技术增强复杂系统的回归模型

Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid

摘要

著录项

相似文献

相关主题

期刊订阅