首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
【24h】

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

机译:序列导师:具有KL控制的序列生成模型的保守微调

获取原文
           

摘要

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data.
机译:本文提出了一种通用的方法,该方法可以改善循环神经网络(RNN)生成的序列的结构和质量,同时保留最初从数据中学习的信息以及样本多样性。首先使用最大似然估计(MLE)在数据上对RNN进行预训练,并将该模型学习的序列中下一个令牌上的概率分布视为优先策略。然后,使用强化学习(RL)对另一个RNN进行训练,以生成高质量的输出,这些输出可说明特定领域的激励措施,同时保持与MLE RNN先前策略的接近度。为了使这一目标正式化,我们从KL控制派生了针对RNN的新颖的政策外RL方法。该方法的有效性在两个应用程序上得到了证明。 1)产生新颖的音乐旋律,以及2)计算分子产生。对于这两个问题,我们表明,所提出的方法改善了所生成序列的所需属性和结构,同时保持了从数据中学到的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号