Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

Kazuteru Miyazaki

首页> 外文期刊>Journal of Advanced Computatioanl Intelligence and Intelligent Informatics >Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

【24h】

Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

机译：连续值罚分避免合理决策算法的建议

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Applying reinforcement learning to actual problems, sometimes requires the treatment of continuous-valued input and output. We previously proposed a process called Exploitation-oriented Learning (XoL) to strongly enhance successful experience and thereby reduce the number of trial-and-error searches. A method based on Penalty-Avoiding Rational Policymaking (PARP) is proposed as a XoL method corresponding to continuous-valued input, but types of action treating continuous-valued output are not executed. We study the treatment of continuous-valued output suitable for a XoL method in which the environment includes both a reward and a penalty. We extend PARP in continuous-valued input to continuous-valued output. We apply our proposal to the pole-cart balancing problem and the biped LEGO robot, and confirm its effectiveness.

机译：将强化学习应用于实际问题，有时需要处理连续值的输入和输出。我们之前提出了一个名为“利用剥削的学习（XoL）”的过程，以极大地增强成功经验，从而减少反复试验的次数。作为与连续值输入相对应的XoL方法，提出了一种基于避免惩罚的理性决策（PARP）的方法，但是并未执行处理连续值输出的动作类型。我们研究了适用于XoL方法的连续值输出的处理方法，在该方法中，环境既包含奖励又包含惩罚。我们将连续值输入中的PARP扩展为连续值输出。我们将我们的建议应用于手推车平衡问题和两足动物乐高机器人，并确认其有效性。

著录项

来源
《Journal of Advanced Computatioanl Intelligence and Intelligent Informatics》 |2012年第90期|共8页
作者
Kazuteru Miyazaki;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类其他计算机;
关键词
Reinforcement learning; Profit sharing; PARP; Exploitation-oriented Learning (XoL);

机译：强化学习;利润分享;PARP;开发性学习（XoL）;

相似文献

外文文献
中文文献
专利

1. Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm [J] . Kazuteru Miyazaki Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2012,第2a90期

机译：连续值罚分避免合理决策算法的建议
2. Balancing instrumental rationality with value rationality: towards avoiding the pitfalls of the productivist ageing policy in the EU and the UK [J] . European journal of ageing . 2020,第2期

机译：平衡有理性合理性的乐器合理性：避免欧盟和英国产品增长政策的陷阱
3. Self-Organizing Probability State Variable Parameter Search Algorithms for Systems that Must Avoid High-Penalty Operating Regions [J] . Mucciardi Anthony N. Systems, Man and Cybernetics, IEEE Transactions on . 1974,第4期

机译：必须避免高罚分操作区域的系统的自组织概率状态变量参数搜索算法
4. Proposal and Evaluation of the Penalty Avoiding Rational Policy Making Algorithm with Penalty Level [C] . Kazuteru Miyazaki, Tomomizu Kojima, Hiroaki Kobayashi SICE Annual Conference . 2007

机译：惩罚罚款罚款罚款罚款的提案和评估
5. Penalty application: A study of penalty proposals and abatements by the Internal Revenue Service. [D] . Adams, Brenda Boswell. 2016

机译：罚款申请：由国税局对罚款建议和减免进行的研究。
6. Balancing instrumental rationality with value rationality: towards avoiding the pitfalls of the productivist ageing policy in the EU and the UK [O] . Jianbin Xu, Longtao He, Henghan Chen 2020

机译：衡量具有价值合理性的工具合理性：避免欧盟和英国产品增长政策的陷阱
7. Proposal for an Algorithm to Improve a Rational Policy in POMDPs [O] . Kazuteru Miyazaki, Shigenobu Kobayashi 1999

机译：关于改进POMDP中合理政策的算法的建议

Proposal of the Continuous-Valued Penalty Avoiding Rational Policy Making Algorithm

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅