...
首页> 外文期刊>Annals of Mathematics and Artificial Intelligence >Algorithm portfolio selection as a bandit problem with unbounded losses
【24h】

Algorithm portfolio selection as a bandit problem with unbounded losses

机译:算法组合选择作为具有无限损失的强盗问题

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We propose a method that learns to allocate computation time to a given set of algorithms, of unknown performance, with the aim of solving a given sequence of problem instances in a minimum time. Analogous meta-learning techniques are typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. We adopt instead an online approach, named GambleTA, in which algorithm performance models are iteratively updated, and used to guide allocation on a sequence of problem instances. GambleTA is a general method for selecting among two or more alternative algorithm portfolios. Each portfolio has its own way of allocating computation time to the available algorithms, possibly based on performance models, in which case its performance is expected to improve over time, as more runtime data becomes available. The resulting exploration-exploitation trade-off is represented as a bandit problem. In our previous work, the algorithms corresponded to the arms of the bandit, and allocations evaluated by the different portfolios were mixed, using a solver for the bandit problem with expert advice, but this required the setting of an arbitrary bound on algorithm runtimes, invalidating the optimal regret of the solver. In this paper, we propose a simpler version of GambleTA, in which the allocators correspond to the arms, such that a single portfolio is selected for each instance. The selection is represented as a bandit problem with partial information, and an unknown bound on losses. We devise a solver for this game, proving a bound on its expected regret. We present experiments based on results from several solver competitions, in various domains, comparing GambleTA with another online method.
机译:我们提出一种方法,该方法学会将计算时间分配给性能未知的给定算法集,目的是在最短时间内解决给定序列的问题实例。类似的元学习技术通常基于算法性能模型,该模型是在单独的脱机训练序列中学习的,这可能会非常昂贵。取而代之的是,我们采用一种名为GambleTA的在线方法,其中算法性能模型被迭代更新,并用于指导一系列问题实例的分配。 GambleTA是在两个或多个替代算法组合中进行选择的通用方法。每个产品组合都有自己的方法(可能基于性能模型)将计算时间分配给可用算法,在这种情况下,随着更多的运行时数据变得可用,其性能有望随着时间的推移而提高。由此产生的勘探与开发的权衡表现为土匪问题。在我们之前的工作中,算法对应于强盗集团,并使用专家建议对强盗问题使用求解器来混合由不同投资组合评估的分配,但这需要在算法运行时上设置任意界限,从而无效求解器的最佳遗憾。在本文中,我们提出了一个GambleTA的简单版本,其中分配器对应于分支,从而为每个实例选择单个投资组合。该选择表示为带有部分信息的匪徒问题,并且损失范围未知。我们为此游戏设计了一个求解器,证明其预期后悔是有限的。我们根据来自不同领域的多个求解器比赛的结果提出实验,将GambleTA与另一种在线方法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号