BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

机译：BottleSum：使用信息瓶颈原理的无监督和自监督语句摘要

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The principle of the Information Bottleneck (Tishby et al., 1999) is to produce a summary of information X optimized to predict some other relevant information Y. In this paper, we propose a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach seeks a compressed sentence that can best predict the next sentence. Our iterative algorithm under the Information Bottleneck objective searches gradually shorter subsequences of the given sentence while maximizing the probability of the next sentence conditioned on the summary. Using only pre-trained language models with no direct supervision, our approach can efficiently perform extractive sentence summarization over a large corpus. Building on our unsupervised extractive summarization (BottleSum~(Ex)), we then present a new approach to self-supervised abstractive summarization (BottleSum~(Self)), where a transformer-based language model is trained on the output summaries of our unsupervised method. Empirical results demonstrate that our extractive method outperforms other unsupervised models on multiple automatic metrics. In addition, we find that our self-supervised abstractive model outperforms unsupervised baselines (including our own) by human evaluation along multiple attributes.

机译：信息瓶颈的原理（Tishby等人，1999）是产生信息X的摘要，优化该信息以预测一些其他相关信息Y。在本文中，我们通过映射信息瓶颈原理提出了一种新的无监督句子摘要方法到条件语言建模目标：给定一个句子，我们的方法会寻找一个压缩句子，该句子可以最好地预测下一个句子。在信息瓶颈目标下，我们的迭代算法逐渐搜索给定句子的较短子序列，同时最大化以摘要为条件的下一个句子的概率。仅使用没有直接监督的经过预先训练的语言模型，我们的方法就可以有效地对大型语料库进行提取式句子摘要。在我们的无监督提取摘要（BottleSum〜（Ex））的基础上，我们提出了一种新的自我监督抽象摘要（BottleSum〜（Self））的方法，其中基于变压器的语言模型是在无监督的输出摘要上进行训练的方法。实证结果表明，在多种自动指标上，我们的提取方法优于其他无监督模型。此外，我们发现，通过对多个属性进行人工评估，我们的自我监督抽象模型优于非监督基线（包括我们自己的基线）。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|3750-3759|共10页
会议地点
作者
Peter West; Ari Holtzman; Jan Buys; Yejin Choi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings [J] . Lamsiyah Salima, El Mahdaouy Abdelkader, Espinasse Bernard, Expert systems with applications . 2021,第Apra期

机译：基于质心方法和句子嵌入的提取多文件摘要的无监督方法
2. Blending Sentence Optimization Weights of Unsupervised Approaches for Extractive Speech Summarization [J] . Noraini Seman, Nursuriati Jamil Procedia Computer Science . 2015,第1期

机译：无监督方法的混合句子优化权重用于提取语音摘要
3. Unsupervised detection of acoustic events using information bottleneck principle [J] . Li Yanxiong, Wang Qin, Li Xianku, Digital Signal Processing . 2017,第期

机译：使用信息瓶颈原理无监督检测声法事件
4. BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle [C] . Peter West, Ari Holtzman, Jan Buys, International joint conference on natural language processing . 2019

机译：瓶子：无监督和自我监督的句子摘要使用信息瓶颈原理
5. Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning [D] . Larsson, Gustav Martin. 2017

机译：通过无监督和自监督的表示学习发现视觉语义
6. Integrating Deep Supervised Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation [O] . Liang Chen, Yuyao Zhai, Qiuyan He, 2020

机译：整合单细胞RNA-SEQ聚类和注释的深度监督自我监督和无监督学习
7. BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle [O] . Peter West, Ari Holtzman, Jan Buys, 2019

机译：瓶子：无监督和自我监督的句子摘要使用信息瓶颈原理
8. Generic Sentence Fusion is an Ill-Defined Summarization Task [R] . Daume, H. , Marcu, D. 2004

机译：通用句子融合是一种不确定的摘要任务

BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

摘要

著录项

相似文献

相关主题

期刊订阅