Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Natasha Jaques; Shixiang Gu; Dzmitry Bahdanau; José Miguel Hernández-Lobato; Richard E. Turner; Douglas Eck

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

【24h】

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

机译：序列导师：具有KL控制的序列生成模型的保守微调

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data.

机译：本文提出了一种通用的方法，该方法可以改善循环神经网络（RNN）生成的序列的结构和质量，同时保留最初从数据中学习的信息以及样本多样性。首先使用最大似然估计（MLE）在数据上对RNN进行预训练，并将该模型学习的序列中下一个令牌上的概率分布视为优先策略。然后，使用强化学习（RL）对另一个RNN进行训练，以生成高质量的输出，这些输出可说明特定领域的激励措施，同时保持与MLE RNN先前策略的接近度。为了使这一目标正式化，我们从KL控制派生了针对RNN的新颖的政策外RL方法。该方法的有效性在两个应用程序上得到了证明。 1）产生新颖的音乐旋律，以及2）计算分子产生。对于这两个问题，我们表明，所提出的方法改善了所生成序列的所需属性和结构，同时保持了从数据中学到的信息。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2017年第3期|共10页
作者
Natasha Jaques; Shixiang Gu; Dzmitry Bahdanau; José Miguel Hernández-Lobato; Richard E. Turner; Douglas Eck;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Mapping sensorimotor sequences to word sequences: A connectionist model of language acquisition and sentence generation [J] . Takac M., Benuskova L., Knott A. Cognition: International Journal of Cognitive Psychology . 2012,第2期

机译：将感觉运动序列映射到单词序列：语言习得和句子生成的连接模型
2. Enhancements to the Sequence-to-Sequence-Based Natural Answer Generation Models [J] . Palasundram Kulothunkan, Sharef Nurfadhlina Mohd, Kasmiran Khairul Azhar, Quality Control, Transactions . 2020,第期

机译：基于序列到序列的自然答案生成模型的增强
3. Sequence alignment generation using intermediate sequence search for homology modeling [J] . Shuichiro Makigaki, Takashi Ishida Computational and Structural Biotechnology Journal . 2020,第1期

机译：使用中间序列搜索的序列对准生成同源性建模
4. Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control [C] . Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, International Conference on Machine Learning . 2018

机译：序列导师：具有KL控制的序列生成模型的保守微调
5. Generation des sequences d'assemblage d'un produit a partir de sa modelisation dans CATIA et AutoCAD (French text). [D] . Coulon, Pierre-Emmanuel. 2002

机译：根据其在CATIA和AutoCAD中的建模生成产品装配序列（法语文本）。
6. Generation and Classification of Activity Sequences for Spatiotemporal Modeling of Human Populations [O] . Albert M Lund, Ramkiran Gouripeddi, Julio C Facelli 2020

机译：人群时空建模活性序列的生成与分类
7. Persian Keyphrase Generation Using Sequence-to-Sequence Models [O] . Ehsan Doostmohammadi, Mohammad Hadi Bokaei, Hossein Sameti 2019

机译：使用序列到序列模型的波斯关键酶生成

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

摘要

著录项

相似文献

相关主题

期刊订阅