In this paper, we propose a novel pretraining-based encoder-decoder framework, which can generate the output sequence based on the input sequence in a two-stage manner. For the encoder of our model, we encode the input sequence into context representations using BERT. For the decoder, there are two stages in our model, in the first stage, we use a Transformer-based decoder to generate a draft output sequence. In the second stage, we mask each word of the draft sequence and feed it to BERT, then by combining the input sequence and the draft representation generated by BERT, we use a Transformer-based decoder to predict the refined word for each masked position. To the best of our knowledge, our approach is the first method which applies the BERT into text generation tasks. As the first step in this direction, we evaluate our proposed method on the text summarization task. Experimental results show that our model achieves new state-of-the-art on both CNN/Daily Mail and New York Times datasets.
展开▼
机译:在本文中,我们提出了一种新颖的基于预训练的编码器-解码器框架,该框架可以基于输入序列以两阶段的方式生成输出序列。对于我们模型的编码器,我们使用BERT将输入序列编码为上下文表示。对于解码器,我们的模型分为两个阶段,在第一阶段,我们使用基于Transformer的解码器生成草稿输出序列。在第二阶段中,我们对草稿序列的每个单词进行屏蔽,并将其馈送到BERT,然后通过将输入序列和BERT生成的草稿表示相结合,我们使用基于Transformer的解码器来预测每个被屏蔽位置的精炼单词。据我们所知,我们的方法是将BERT应用于文本生成任务的第一种方法。作为朝着这个方向迈出的第一步,我们对文本摘要任务评估了我们提出的方法。实验结果表明,我们的模型在CNN / Daily Mail和New York Times数据集上均达到了最新水平。
展开▼