Path planning with user route preference - A reward surface approximation approach using orthogonal Legendre polynomials

机译：具有用户路径偏好的路径规划 - 一种使用正交图例多项式的奖励表面近似方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As self driving cars become more ubiquitous, users would look for natural ways of informing the car AI about their personal choice of routes. This choice is not always dictated by straightforward logic such as shortest distance or shortest time, and can be influenced by hidden factors, such as comfort and familiarity. This paper presents a path learning algorithm for such applications, where from limited positive demonstrations, an autonomous agent learns the user's path preference and honors that choice in its route planning, but has the capability to adopt alternate routes, if the original choice(s) become impractical. The learning problem is modeled as a Markov decision process. The states (way-points) and actions (to move from one way-point to another) are pre-defined according to the existing network of paths between the origin and destination and the user's demonstration is assumed to be a sample of the preferred path. The underlying reward function which captures the essence of the demonstration is computed using an inverse reinforcement learning algorithm and from that the entire path mirroring the expert's demonstration is extracted. To alleviate the problem of state space explosion when dealing with a large state space, the reward function is approximated using a set of orthogonal polynomial basis functions with a fixed number of coefficients regardless of the size of the state space. A six fold reduction in total learning time is achieved compared to using simple basis functions, that has dimensionality equal to the number of distinct states.

机译：随着自动驾驶汽车变得更加普遍，用户会寻找通知汽车AI关于他们个人选择路线的自然方式。这种选择并不总是通过直截了当的逻辑来决定，例如最短的距离或最短的时间，并且可以受到隐藏因素的影响，例如舒适和熟悉程度。本文介绍了这种应用的路径学习算法，其中来自有限的积极演示，自主代理学习用户的路径偏好和荣誉在其路线规划中选择，但如果原始选择，则具有采用替代路线的能力变得不切实际。学习问题被建模为Markov决策过程。根据原点和目的地之间的现有路径网络预先界定状态（方式点）和操作（从单向点到另一个方式移动），并且假设用户的演示是优选路径的样本。使用逆钢筋学习算法计算捕获示范本质的潜在奖励功能，并从中提取专家演示的整个路径镜像。为了减轻在处理大状态空间时的状态空间爆炸问题，使用具有固定数量的系数的一组正交多项式基函数来近似奖励函数，无论状态空间的大小如何。与使用简单的基本函数相比，实现了总学习时间的六倍减少，其具有等于不同状态的数量的维度。

著录项

来源
《IEEE International Conference on Automation Science and Engineering》|2016年|805-1410p|共6页
会议地点
作者
Aravinda Ramakrishnan Srinivasan; Subhadeep Chakraborty;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP2-53;
关键词
Learning (artificial intelligence); Path planning; Databases; Markov processes; Planning; Real-time systems; Automobiles;

机译：学习（人工智能）;路径规划;数据库;马尔可夫进程;规划;实时系统;汽车;

相似文献

外文文献
中文文献
专利

1. Legendre orthogonal polynomials approximation of system gramians and its application to balanced truncation [J] . Perev Kamen L. International journal of systems science . 2018,第9a12期

机译：系统gramians的Legendre正交多项式逼近及其在平衡截断中的应用。
2. A Novel Approach to Range Doppler SAR Processing Based on Legendre Orthogonal Polynomials [J] . Bin Deng, Yuliang Qin, Yanpeng Li, IEEE Geoscience and Remote Sensing Letters . 2009,第1期

机译：基于Legendre正交多项式的距离多普勒SAR处理新方法
3. Equivalence between voltage-processing methods and discrete orthogonal Legendre polynomial (DOLP) approach [J] . Brookner E. IEEE Transactions on Signal Processing . 1999,第8期

机译：电压处理方法和离散正交勒让德多项式（DOLP）方法之间的等效性
4. Path planning with user route preference - A reward surface approximation approach using orthogonal Legendre polynomials [C] . Aravinda Ramakrishnan Srinivasan, Subhadeep Chakraborty IEEE International Conference on Automation Science and Engineering . 2016

机译：具有用户路线偏好的路径规划-使用正交Legendre多项式的奖励曲面近似方法
5. Adapting harmonic function path planning: To reflect user motion preferences. [D] . D'Silva, Giles John. 2008

机译：调整谐波函数路径规划：反映用户的运动偏好。
6. DEGREE OF POLYNOMIAL APPROXIMATION TO AN ANALYTIC FUNCTION AS MEASURED BY A SURFACE INTEGRAL [O] . J. L. Walsh 1962

机译：由表面积分测得的解析函数的多项式逼近度
7. Polynomial Time Approximations for Multi-Path Routing with Bandwidth and Delay Constraints [O] . Satyajayant Misra, Guoliang Xue, Dejun Yang 2010

机译：具有带宽和延迟约束的多路径路由的多项式时间近似
8. Approximations of Orthogonal Polynomials in Terms of Hermite Polynomials. [R] . Lopez, J. L., Temme, N. M. 1999

机译：用Hermite多项式逼近正交多项式。

Path planning with user route preference - A reward surface approximation approach using orthogonal Legendre polynomials

摘要

著录项

相似文献

相关主题

期刊订阅