首页> 外文会议>European conference on genetic programming >Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training Data

【24h】

Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training Data

机译：通过训练数据的交错采样来平衡遗传编程中的学习和过度拟合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Generalization is the ability of a model to perform well on cases not seen during the training phase. In Genetic Programming generalization has recently been recognized as an important open issue, and increased efforts are being made towards evolving models that do not overfit. In this work we expand on recent developments that showed that using a small and frequently changing subset of the training data is effective in reducing overfitting and improving generalization. Particularly, we build upon the idea of randomly choosing a single training instance at each generation and balance it with periodically using all training data. The motivation for this approach is based on trying to keep overfitting low (represented by using a single training instance) and still presenting enough information so that a general pattern can be found (represented by using all training data). We propose two approaches called interleaved sampling and random interleaved sampling that respectively represent doing this balancing in a deterministic or a probabilistic way. Experiments are conducted on three high-dimensional real-life datasets on the pharmacokinetics domain. Results show that most of the variants of the proposed approaches are able to consistently improve generalization and reduce overfitting when compared to standard Genetic Programming. The best variants are even able of such improvements on a dataset where a recent and representative state-of-the-art method could not. Furthermore, the resulting models are short and hence easier to interpret, an important achievement from the applications' point of view.

机译：泛化是模型在训练阶段未看到的案例上表现良好的能力。在遗传编程中，泛化最近已被认为是一个重要的开放问题，并且正在为开发不会过度拟合的模型做出更多的努力。在这项工作中，我们对最近的发展进行了扩展，这些发展表明，使用少量且经常变化的训练数据子集可以有效地减少过度拟合和改善泛化的情况。特别是，我们基于在每一代随机选择一个训练实例，并定期使用所有训练数据来平衡它的想法。这种方法的动机是基于尝试将过拟合保持在较低水平（通过使用单个训练实例表示），并且仍然提供足够的信息，以便可以找到常规模式（通过使用所有训练数据表示）。我们提出了两种方法，分别称为交错采样和随机交错采样，分别表示以确定性或概率性方式进行此平衡。在药代动力学领域对三个高维现实生活数据集进行了实验。结果表明，与标准的遗传程序设计相比，所提出方法的大多数变体能够一致地提高泛化能力并减少过度拟合。最好的变体甚至可以在数据集上进行此类改进，而最新的代表性技术是无法做到的。此外，生成的模型很短，因此更易于解释，这是从应用程序角度来看的一项重要成就。

著录项

来源
《European conference on genetic programming 》|2013年|73-84|共12页
会议地点
作者
Ivo Goncalves; Sara Silva;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Genetic Programming; Overfitting; Generalization; Pharmacokinetics; Drug Discovery;

机译：基因编程;过度拟合;概括;药代动力学;药物发现;

相似文献

外文文献
中文文献
专利

1. Population subset selection for the use of a validation dataset for overfitting control in genetic programming [J] . Rivero Daniel, Fernandez-Blanco Enrique, Fernandez-Lozano Carlos, Journal of Experimental and Theoretical Artificial Intelligence . 2020 ,第2期

机译：用于使用验证数据集进行遗传编程中的过度控制的人口子集选择
2. Multi-objective genetic programming for manifold learning: balancing quality and dimensionality [J] . Andrew Lensen, Mengjie Zhang, Bing Xue Genetic programming and evolvable machines . 2020 ,第3期

机译：多目标遗传编程为多目标学习：平衡质量和维度
3. Transfer Learning for Crop classification with Cropland Data Layer data (CDL) as training samples [J] . Pengyu Hao, Liping Di, Chen Zhang, The Science of the Total Environment . 2020 ,第Sepa1期

机译：通过作为训练样本的裁剪数据层数据（CDL）传输学习作物分类
4. Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training Data [C] . Ivo Goncalves, Sara Silva European Conference on Genetic Programming . 2013

机译：用训练数据交错抽样的遗传编程平衡学习和过度
5. The comparative effects of a six-week balance training program, gluteus medius strength training program, and combined balance training/gluteus medius strength training program on dynamic postural control. [D] . Leavey, Vincent J. 2006

机译：为期六周的平衡训练计划，臀中肌力量训练计划以及平衡训练/臀中肌综合训练计划对动态姿势控制的比较效果。
6. An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data [O] . Yaohao Peng, Mateus Hiro Nagata -1

机译：使用COVID-19数据的机器学习中非线性和过度拟合的经验概述
7. Population subset selection for the use of a validation dataset for overfitting control in genetic programming [O] . Daniel Rivero, Enrique Fernandez-Blanco, Carlos Fernandez-Lozano, 2019

机译：用于使用验证数据集进行遗传编程中的过度控制的人口子集选择

Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training Data

摘要

著录项

相似文献

相关主题

期刊订阅