Inaccuracy of State-Action Value Function For Non-Optimal Actions in Adversarially Trained Deep Neural Policies

机译：在对外地培训的深度神经政策中的非最佳行为的状态 - 行动价值函数不准确

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The introduction of deep neural networks as function approximator for the state-action value function has led to the creation of a new research area for self-learning systems that explore policies from high dimensional input. While the success of deep neural policies has resulted in the deployment of these policies in diversified application domains, there are significant concerns regarding their robustness towards specifically crafted malicious perturbations introduced to their inputs. Several studies have focused on making deep neural policies resistant to such perturbations via training with the existence of these perturbations (i.e. adversarial training). In this paper we focus on conducting an investigation on the state-action value function learned by state-of-the-art adversarially trained deep neural policies and vanilla trained deep neural policies. We perform several experiments in the OpenAI Baselines and we show that the state-action value functions learned by vanilla trained deep neural policies have better estimates for the non-optimal actions than the state-of-the-art adversarially trained deep neural policies. We believe our study lays out intriguing properties of adversarial training and could be critical step towards obtaining robust and reliable policies.

机译：深度神经网络作为状态逼近的函数近似值导致了为从高维输入探索政策的自学习系统创建了新的研究区域。虽然深度神经政策的成功导致在多元化的应用领域中部署了这些政策，但对其对其投入的特制恶意扰动的稳健性有重大问题。几项研究专注于通过培训进行这些扰动（即对抗培训）来促进这种扰动的深度神经政策。在本文中，我们专注于对通过最先进的离境培训的深度神经政策和香草培训的深层神经政策进行了对国家行动价值函数的调查。我们在Openai基准中执行了几个实验，我们表明Vanilla训练有素的深度神经政策学习的国家行动价值函数对非最佳行为的估计而不是最先进的离境培训的深层神经政策。我们相信我们的研究提出了对抗性培训的兴趣性质，可能是获得强大且可靠的政策的关键步骤。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops》|2021年|2323-2327|共5页
会议地点
作者
Ezgi Korkmaz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Resistance; Deep learning; Computer vision; Systematics; Perturbation methods; Conferences;

机译：培训;抵抗;深入学习;计算机愿景;系统性;扰动方法;会议;

相似文献

外文文献
中文文献
专利

1. learning with policy prediction in continuous state-action multi-agent decision processes [J] . Soft computing: A fusion of foundations, methodologies and applications . 2020,第2期

机译：在连续状态动作多代理决策过程中学习策略预测
2. On the empirical state-action frequencies in Markov decision processes under general policies [J] . Mannor S, Tsitsiklis JN Mathematics of operations research . 2005,第3期

机译：一般策略下马尔可夫决策过程中的经验状态作用频率
3. Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network [J] . Takaaki Kobayashi, Takeshi Shibuya, Masahiko Morita Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2015,第6a113期

机译：通过使用选择性脱敏神经网络在具有噪声和冗余输入的连续状态-动作空间中进行Q学习
4. Empirical Evaluation on Robustness of Deep Convolutional Neural Networks Activation Functions Against Adversarial Perturbation [C] . Jiawei Su, Danilo Vasconcellos Vargas, Kouichi Sakurai International Symposium on Computing and Networking Workshops . 2018

机译：深卷积神经网络激活函数对抗对抗性摄动的鲁棒性的经验评估
5. A Tale of Fairness Revisited: Beyond Adversarial Learning for Deep Neural Network Fairness [D] . Mashaido, Becky. 2022

机译：重新审查的公平性：超越深层神经网络公平的对抗学习
6. Automated Pulmonary Nodule Classification in Computed Tomography Images Using a Deep Convolutional Neural Network Trained by Generative Adversarial Networks [O] . Yuya Onishi, Atsushi Teramoto, Masakazu Tsujimoto, 2006

机译：使用由生成对抗网络训练的深卷积神经网络在计算机断层扫描图像中自动进行肺结节分类
7. Parametric Noise Injection: Trainable Randomness to Improve Deep Neural Network Robustness Against Adversarial Attack [O] . Zhezhi He, Adnan Siraj Rakin, Deliang Fan 2019

机译：参数噪声注射：可训练随机性，以提高对抗对抗攻击的深度神经网络鲁棒性

Inaccuracy of State-Action Value Function For Non-Optimal Actions in Adversarially Trained Deep Neural Policies

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅