首页> 外国专利> SYSTEMS AND METHODS FOR SAFE POLICY IMPROVEMENT FOR TASK ORIENTED DIALOGUES

SYSTEMS AND METHODS FOR SAFE POLICY IMPROVEMENT FOR TASK ORIENTED DIALOGUES

机译:用于安全策略改进的任务导向对话的系统和方法

摘要

Embodiments described herein provide safe policy improvement (SPI) in a batch reinforcement learning framework for a task-oriented dialogue. Specifically, a batch reinforcement learning framework for dialogue policy learning is provided, which improves the performance of the dialogue and learns to shape a reward that reasons the invention behind human response rather than just imitating the human demonstration.
机译:这里描述的实施例提供用于面向任务对话的批量增强学习框架中的安全策略改进(SPI)。 具体而言,提供了一种用于对话策略学习的批量增强学习框架,从而提高了对话的性能,并学会塑造原因,原因是人类反应背后的发明,而不是模仿人类示范。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号