...
首页> 外文期刊>ACM transactions on intelligent systems and technology >Two Can Play That Game: An Adversarial Evaluation of a Cyber-Alert Inspection System
【24h】

Two Can Play That Game: An Adversarial Evaluation of a Cyber-Alert Inspection System

机译:两者可以播放这个游戏:对网络警报检查系统的对抗评估

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Cyber-security is an important societal concern. Cyber-attacks have increased in numbers as well as in the extent of damage caused in every attack. Large organizations operate a Cyber Security Operation Center (CSOC), which forms the first line of cyber-defense. The inspection of cyber-alerts is a critical part of CSOC operations (defender or blue team). Recent work proposed a reinforcement learning (RL) based approach for the defender's decision-making to prevent the cyber-alert queue length from growing large and overwhelming the defender. In this article, we perform a red team (adversarial) evaluation of this approach. With the recent attacks on learning-based decision-making systems, it is even more important to test the limits of the defender's RL approach. Toward that end, we learn several adversarial alert generation policies and the best response against them for various defender's inspection policy. Surprisingly, we find the defender's policies to be quite robust to the best response of the attacker. In order to explain this observation, we extend the earlier defender's RL model to a game model with adversarial RL, and show that there exist defender policies that can be robust against any adversarial policy. We also derive a competitive baseline from the game theory model and compare it to the defender's RL approach. However, when we go further to exploit the assumptions made in the Markov Decision Process (MDP) in the defender's RL model, we discover an attacker policy that overwhelms the defender. We use a double oracle like approach to retrain the defender with episodes from this discovered attacker policy. This made the defender robust to the discovered attacker policy and no further harmful attacker policies were discovered. Overall, the adversarial RL and double oracle approach in RL are general techniques that are applicable to other RL usage in adversarial environments.
机译:网络安全是一个重要的社会问题。网络攻击的数量增加,以及在每次攻击中造成的损坏程度。大型组织经营网络安全运营中心(CSOC),它形成了一系列网络防御。网络警报的检查是CSOC操作(后卫或蓝队)的关键部分。最近的工作提出了一种基于捍卫者的决策的加强学习(RL)方法,以防止网络警报队列长度生长大,压倒性的后卫。在本文中,我们执行这种方法的红色团队(对抗)评估。随着最近对基于学习的决策系统的攻击,测试防御者的RL方法的限制更为重要。为此,我们为各种后卫的检验政策学习了几个对抗的警报生成政策和对他们的最佳反应。令人惊讶的是,我们发现后卫的政策对攻击者的最佳反应非常强大。为了解释这一观察,我们将早期的后卫的RL模型扩展到具有对冲RL的游戏模型,并表明存在对任何对抗政策具有稳健的后卫政策。我们还从博弈论模型中获得了竞争性基线,并将其与后卫的RL方法进行比较。但是,当我们进一步利用Wardender的RL模型中的马尔可夫决策过程(MDP)中的假设时,我们发现攻击者政策压倒了后卫。我们使用双层甲骨文类似方法来用来自这一发现的攻击者政策的剧集恢复捍卫者。这使得后卫对发现的攻击者政策的强大,并且没有发现进一步有害的攻击者政策。总的来说,RL中的对抗RL和双甲骨文方法是适用于对抗环境中其他RL使用的一般技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号