首页> 外文会议>AI 2003: Advances in Artificial Intelligence >On How to Learn from a Stochastic Teacher or a Stochastic Compulsive Liar of Unknown Identity
【24h】

On How to Learn from a Stochastic Teacher or a Stochastic Compulsive Liar of Unknown Identity

机译:关于如何向随机老师或身份未知的随机强迫说谎者学习

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem of a learning mechanism (robot, or algorithm) that learns a parameter while interacting with either a stochastic teacher or a stochastic compulsive liar. The problem is modeled as follows: the learning mechanism is trying to locate an unknown point on a real interval by interacting with a stochastic environment through a series of guesses. For each guess the environment (teacher) essentially informs the mechanism, possibly erroneously, which way it should move to reach the point. Thus, there is a non-zero probability that the feedback from the environment is erroneous. When the probability of correct response is p > 0.5, the environment is said to be Informative, and we have the case of learning from a stochastic teacher. When this probability is p < 0.5 the environment is deemed Deceptive, and is called a stochastic compulsive liar. This paper describes a novel learning strategy by which the unknown parameter can be learned in both environments. To the best of our knowledge, our results are the first reported results which are applicable to the latter scenario. Another main contribution of this paper is that the proposed scheme is shown to operate equally well even when the learning mechanism is unaware whether the environment is Informative or Deceptive. The learning strategy proposed herein, called CPL-ATS, partitions the search interval into three equi-sized sub-intervals, evaluates the location of the unknown point with respect to these sub-intervals using fast-converging e-optimal L_(RI) learning automata, and prunes the search space in each iteration by eliminating at least one partition. The CPL-ATS algorithm is shown to be provably converging to the unknown point to an arbitrary degree of accuracy with probability as close to unity as desired. Comprehensive experimental results confirm the fast and accurate convergence of the search for a wide range of values for the environment's feedback accuracy parameter p. The above algorithm can be used to learn parameters for non-linear optimization techniques.
机译:我们考虑一种学习机制(机器人或算法)的问题,该机制在与随机教师或随机强迫说谎者互动时学习参数。问题建模如下:学习机制通过一系列猜测与随机环境进行交互,试图在真实间隔上定位未知点。对于每个猜测,环境(教师)从本质上可能会错误地告知该机制,该机制应以何种方式达到目标。因此,来自环境的反馈有错误的可能性为非零。当正确回答的概率为p> 0.5时,就可以说环境是信息性的,我们有向随机老师学习的情况。当此概率p <0.5时,环境被认为具有欺骗性,被称为随机强迫说谎者。本文介绍了一种新颖的学习策略,通过该策略可以在两种环境中学习未知参数。据我们所知,我们的结果是第一个报告的结果,适用于后一种情况。本文的另一个主要贡献是,即使学习机制不知道环境是信息性的还是欺骗性的,所提出的方案也能很好地运行。本文中提出的称为CPL-ATS的学习策略将搜索间隔分为三个等长的子间隔,并使用快速收敛的e-最优L_(RI)学习相对于这些子间隔评估未知点的位置自动执行,并通过消除至少一个分区来在每次迭代中修剪搜索空间。已显示CPL-ATS算法以任意精度接近任意所需的概率证明可以收敛到未知点。全面的实验结果证实,对于环境的反馈精度参数p,可以找到各种值的快速而准确的收敛。以上算法可用于学习非线性优化技术的参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号