首页> 外国专利> Conservative learning algorithm for safe personalized recommendation

Conservative learning algorithm for safe personalized recommendation

机译：保守职业技术推荐的保守学习算法

页面导航

摘要
著录项
相似文献

摘要

A digital medium environment includes an action processing application that performs actions including personalized recommendation. A learning algorithm operates on a sample-by-sample basis (e.g., each instance a user visits a web page) and recommends an optimistic action, such as an action found by maximizing an expected reward, or a base action, such as an action from a baseline policy with known expected reward, subject to a safety constraint. The safety constraint requires that the expected performance of playing optimistic actions is at least as good as a predetermined percentage of the known performance of playing base actions. Thus, the learning algorithm is conservative during exploratory early stages of learning, and does not play unsafe actions. Furthermore, since the learning algorithm is online and can learn with each sample, it converges quickly and is able to track time varying parameters better than learning algorithms that learn on a block basis.

机译：数字介质环境包括执行包括个性化推荐的动作的动作处理应用程序。学习算法以样本基础（例如，用户访问网页）并推荐乐观动作，例如通过最大化预期奖励或基本操作（例如Action）从具有已知预期奖励的基线政策，受安全约束。安全约束要求播放乐观行动的预期性能至少与播放基础行动的已知性能的预定百分比一样好。因此，学习算法在探索性早期学习期间是保守的，并且不起不安全的行动。此外，由于学习算法在线并且可以使用每个样本来学习，因此它快速收敛并能够比在块基础上学习的学习算法更好地跟踪时间变化参数。

著录项

公开/公告号US11004011B2

专利类型
公开/公告日2021-05-11

原文格式PDF
申请/专利权人 ADOBE INC.;
展开▼

申请/专利号US201715424695
发明设计人 MOHAMMAD GHAVAMZADEH;ABBAS KAZEROUNI;
展开▼

申请日2017-02-03
分类号G06N20;G06F16/9535;G06Q30/02;G06Q40/06;
国家 US
入库时间 2024-06-14 21:31:45

相似文献

专利
外文文献
中文文献