首页> 外文会议>International joint conference on natural language processing;Conference on empirical methods in natural language processing >BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
【24h】

BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

机译:BottleSum:使用信息瓶颈原理的无监督和自监督语句摘要

获取原文

摘要

The principle of the Information Bottleneck (Tishby et al., 1999) is to produce a summary of information X optimized to predict some other relevant information Y. In this paper, we propose a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach seeks a compressed sentence that can best predict the next sentence. Our iterative algorithm under the Information Bottleneck objective searches gradually shorter subsequences of the given sentence while maximizing the probability of the next sentence conditioned on the summary. Using only pre-trained language models with no direct supervision, our approach can efficiently perform extractive sentence summarization over a large corpus. Building on our unsupervised extractive summarization (BottleSum~(Ex)), we then present a new approach to self-supervised abstractive summarization (BottleSum~(Self)), where a transformer-based language model is trained on the output summaries of our unsupervised method. Empirical results demonstrate that our extractive method outperforms other unsupervised models on multiple automatic metrics. In addition, we find that our self-supervised abstractive model outperforms unsupervised baselines (including our own) by human evaluation along multiple attributes.
机译:信息瓶颈的原理(Tishby等人,1999)是产生信息X的摘要,优化该信息以预测一些其他相关信息Y。在本文中,我们通过映射信息瓶颈原理提出了一种新的无监督句子摘要方法到条件语言建模目标:给定一个句子,我们的方法会寻找一个压缩句子,该句子可以最好地预测下一个句子。在信息瓶颈目标下,我们的迭代算法逐渐搜索给定句子的较短子序列,同时最大化以摘要为条件的下一个句子的概率。仅使用没有直接监督的经过预先训练的语言模型,我们的方法就可以有效地对大型语料库进行提取式句子摘要。在我们的无监督提取摘要(BottleSum〜(Ex))的基础上,我们提出了一种新的自我监督抽象摘要(BottleSum〜(Self))的方法,其中基于变压器的语言模型是在无监督的输出摘要上进行训练的方法。实证结果表明,在多种自动指标上,我们的提取方法优于其他无监督模型。此外,我们发现,通过对多个属性进行人工评估,我们的自我监督抽象模型优于非监督基线(包括我们自己的基线)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号