Poisoning finite-horizon Markov decision processes at design time

Caballero William N.; Jenkins Phillip R.; Keith Andrew J.

首页> 外文期刊>Computers & operations research >Poisoning finite-horizon Markov decision processes at design time

【24h】

Poisoning finite-horizon Markov decision processes at design time

机译：设计时间中毒有限地平线马尔可夫决策过程

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

The contemporary decision making environment is becoming increasingly more automated. Developments in artificial intelligence, machine learning, and operations research have increased the prevalence of computer systems in decision making tasks across a myriad of applications. Markov decision processes (MDPs) are utilized in a variety of system controllers, and attacks against them are of particular interest, even though this problem structure is relatively understudied in the adversarial learning literature. Therefore, in this research, we consider the finite-horizon MDP poisoning problem wherein an adversary perturbs a decision maker's baseline MDP formulation to induce desired behavior while balancing the risk of attack detection. We formally define the associated mathematical programming formulation as a mixed-integer bilevel programming problem. We provide a single-level representation that can be handled by some commercial global solvers, but, since their performance is frequently inadequate, we develop gradient-based, gradient-free, and bifurcation heuristic solution methodologies that include self-tuning extensions. The performance of these algorithms is explored on a wide array of sample problem instances to determine their relative efficacy in terms of solution quality and computational effort for different finite-horizon MDP structures. Published by Elsevier Ltd.

机译：当代决策环境变得越来越自动化。人工智能，机器学习和运营研究的开发增加了计算机系统在跨无数应用中的决策任务中的普遍存在。马尔可夫决策过程（MDP）在各种系统控制器中使用，并且对于对抗的这种问题结构在对抗学习文献中，即使在对抗学习文献中相对解读了这种问题结构，也是特别令人感兴趣的。因此，在这项研究中，我们考虑了有限地平线MDP中毒问题，其中对手渗透了决策者的基线MDP制剂，以在平衡攻击检测风险的同时诱导所需的行为。我们正式将相关的数学编程配方定义为混合整数彼此编程问题。我们提供了一个单级表示，可以由一些商业全球求解器处理，但由于它们的性能经常不足，我们开发了包括自调整扩展的梯度，渐变的无姿态和分叉启发式解决方法。在各种样本问题实例上探讨了这些算法的性能，以确定它们在解决方案质量和不同有限范围MDP结构的计算工作方面的相对功效。 elsevier有限公司出版

著录项

来源
《Computers & operations research》 |2021年第5期|105185.1-105185.17|共17页
作者
Caballero William N.; Jenkins Phillip R.; Keith Andrew J.;
展开▼
作者单位

Air Force Inst Technol Dept Operat Sci 2950 Hobson Way Wright Patterson AFB OH 45433 USA;

Air Force Inst Technol Dept Operat Sci 2950 Hobson Way Wright Patterson AFB OH 45433 USA;

Air Force Studies Anal & Assessments 1690 Air Force Pentagon Washington DC 20330 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Markov decision process; Adversarial learning; Data poisoning; Machine learning; Reinforcement learning;

机译：马尔可夫决策过程;对抗学习;数据中毒;机器学习;加强学习;

Poisoning finite-horizon Markov decision processes at design time

摘要

著录项

相关主题

期刊订阅