首页> 外文会议>IEEE International Conference on Intelligent Transportation Systems >Adaptive Stress Testing without Domain Heuristics using Go-Explore
【24h】

Adaptive Stress Testing without Domain Heuristics using Go-Explore

机译:使用Go-Develore没有域启发式的自适应压力测试

获取原文

摘要

Recently, reinforcement learning (RL) has been used as a tool for finding failures in autonomous systems. During execution, the RL agents often rely on some domains-pecific heuristic reward to guide them towards finding failures, but constructing such a heuristic may be difficult or infeasible. Without a heuristic, the agent may only receive rewards at the time of failure, or even rewards that guide it away from failures. For example, some approaches give rewards for taking more likely actions, in order to to find more likely failures. However, the agent may then learn to only take likely actions, and may not be able to find a failure at all. Consequently, the problem becomes a hard-exploration problem, where rewards do not aid exploration. A new algorithm, go-explore (GE), has recently set new records on benchmarks from the hard-exploration field. We apply GE to adaptive stress testing (AST), one example of an RL-based falsification approach that provides a way to search for the most likely failure scenario. We simulate a scenario where an autonomous vehicle drives while a pedestrian is crossing the road. We demonstrate that GE is able to find failures without domain-specific heuristics, such as the distance between the car and the pedestrian, on scenarios that other RL techniques are unable to solve. Furthermore, inspired by the robustification phase of GE, we demonstrate that the backwards algorithm (BA) improves the failures found by other RL techniques.
机译:最近,强化学习(RL)已被用作用于在自治系统中找到故障的工具。在执行期间,RL代理商通常依赖于一些域名的启发式奖励来指导他们寻找失败,但构建这种启发式可能是困难或不可行的。没有启发式,代理人只能在失败时接收奖励,甚至会导致它远离失败的奖励。例如,某些方法为采取更可能的行为提供奖励,以便找到更可能的失败。然而,代理人可以学会只采取可能的行动,并且可能无法找到失败。因此,问题成为一个艰难的探索问题,其中奖励不援助探索。新的算法Go-explore(GE)最近在硬探索字段的基准上设置了新记录。我们将GE应用于自适应压力测试(AST),一个基于RL的伪造方法的一个示例,提供了一种搜索最可能的失败情景的方法。我们模拟了一个自主车辆驱动器,而行人在路上过马路的场景。我们展示了GE能够在没有域特定启发式的情况下找到故障,例如汽车与行人之间的距离,在其他RL技术无法解决的情况下。此外,由GE的稳定阶段启发,我们证明了向后算法(BA)改善了其他RL技术发现的故障。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号