首页>
外国专利>
SYSTEMS AND METHODS FOR SAFE POLICY IMPROVEMENT FOR TASK ORIENTED DIALOGUES
SYSTEMS AND METHODS FOR SAFE POLICY IMPROVEMENT FOR TASK ORIENTED DIALOGUES
展开▼
机译:用于安全策略改进的任务导向对话的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
Embodiments described herein provide safe policy improvement (SPI) in a batch reinforcement learning framework for a task-oriented dialogue. Specifically, a batch reinforcement learning framework for dialogue policy learning is provided, which improves the performance of the dialogue and learns to shape a reward that reasons the invention behind human response rather than just imitating the human demonstration.
展开▼