首页> 外文学位 >State-aggregation algorithms for learning probabilistic models for robot control.
【24h】

State-aggregation algorithms for learning probabilistic models for robot control.

机译:状态聚集算法,用于学习机器人控制的概率模型。

获取原文
获取原文并翻译 | 示例

摘要

This thesis addresses the problem of learning probabilistic representations of dynamical systems with non-linear dynamics and hidden state in the form of partially observable Markov decision process (PON-IDP) models, with the explicit purpose of using these models for robot control. In contrast to the usual approach to learning probabilistic models, which is based on iterative adjustment of probabilities so as to improve the likelihood of the observed data, the algorithms proposed in this thesis take a different approach—they reduce the learning problem to that of state aggregation by clustering in an embedding space of delayed coordinates, and subsequently estimating transition probabilities between aggregated states (clusters). This approach has close ties to the dominant methods for system identification in the field of control engineering, although the characteristics of POMDP models require very different algorithmic solutions.; Apart from an extensive investigation of the performance of the proposed algorithms in simulation, they are also applied to two robots built in the course of our experiments. The first one is a differential-drive mobile robot with a minimal number of proximity sensors, which has to perform the well-known robotic task of self-localization along the perimeter of its workspace. In comparison to previous neural-net based approaches to the same problem, our algorithm achieved much higher spatial accuracy of localization. The other task is visual servo-control of an under-actuated arm which has to rotate a flying ball attached to it so as to maintain maximal height of rotation with minimal energy expenditure. Even though this problem is intractable for known control engineering methods due to its strongly non-linear dynamics and partially observable state, a control policy obtained by means of policy iteration on a POMDP model learned by our state-aggregation algorithm performed better than several alternative open-loop and closed-loop controllers.
机译:本文以部分可观察的马尔可夫决策过程(PON-IDP)模型的形式解决了学习具有非线性动力学和隐藏状态的动力学系统的概率表示的问题,其明确目的是将这些模型用于机器人控制。与基于概率迭代调整以提高观测数据的可能性的通常的概率模型学习方法相反,本文提出的算法采用了另一种方法,即将学习问题减少到状态问题。通过聚集在延迟坐标的嵌入空间中进行聚集,然后估计聚集状态(群集)之间的转移概率。尽管POMDP模型的特性需要非常不同的算法解决方案,但这种方法与控制工程领域的主流系统识别方法紧密相关。除了广泛研究所提出算法在仿真中的性能外,它们还被应用于我们实验过程中构建的两个机器人。第一个是具有最少数量的接近传感器的差速驱动移动机器人,该机器人必须执行沿其工作区周边进行自动定位的众所周知的机器人任务。与以前针对相同问题的基于神经网络的方法相比,我们的算法实现了更高的定位空间精度。另一个任务是对欠驱动臂的视觉伺服控制,该臂必须旋转连接到其上的飞球,以便以最小的能量消耗保持最大的旋转高度。尽管此问题由于其强烈的非线性动力学和部分可观察的状态而对于已知的控制工程方法来说是棘手的,但通过状态聚合算法学习的POMDP模型上通过策略迭代获得的控制策略的性能要好于其他几种开放方法闭环和闭环控制器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号