首页> 外国专利> A learning method and a learning device for performing customized route planning by supporting reinforcement learning by using human travel data as training data

A learning method and a learning device for performing customized route planning by supporting reinforcement learning by using human travel data as training data

机译:通过使用人类旅行数据作为训练数据支持强化学习来执行定制路线计划的学习方法和学习设备

摘要

PROBLEM TO BE SOLVED: To provide a learning method and apparatus for optimizing route planning for each passenger, a test method using the same, and a testing apparatus. In a learning device, a process S01-1, which uses an adjusted reward network to generate first adjusted rewards by referring to information on an actual situation vector and an actual motion included in a traveling locus, The common reward module refers to the information on the actual situation vector and the actual motion to generate the first common reward, and the process S01-2 is used as a prediction network to refer to the actual situation vector and calculate the actual expected values. A step S01-2 of performing a process of generating, a learning device learns a parameter of the adjustment reward network by performing a step S02 of generating an adjustment reward loss and backpropagation with a first loss layer. The step S03 is performed. [Selection diagram] Fig. 3
机译:要解决的问题:提供一种用于为每个乘客优化路线计划的学习方法和设备,使用该学习方法的设备以及一种测试设备。在学习设备中,处理S01-1,其使用调整后的奖励网络通过参考关于行驶轨迹中包括的实际情况矢量和实际运动的信息来生成第一调整后的奖励。实际情况矢量和实际运动产生第一共同报酬,流程S01-2作为预测网络参考实际情况矢量计算实际期望值。在进行生成处理的步骤S01-2中,学习装置通过执行生成调整奖励损失和与第一损失层的反向传播的步骤S02,来学习调整奖励网络的参数。执行步骤S03。 [选择图]图3

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号