The talk gives an overview of a relatively recent trend in critical embedded controller verification: the use of (possibly deep) reinforcement learning algorithms for property falsification. The central idea is to use temporal logics with real-valued robust semantics to formulate safety objectives, and to formulate the property falsification problem as reward optimization problem, which can be solved using reinforcement learning algorithms for optimal planning or optimal policy synthesis. After introducing basic definitions and concepts, we review a collection of landmark papers, then we illustrate the approach with results obtained on an significant Airbus case study. Last, we outline current challenges and future research directions.
展开▼