...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >An Efficient Approach for Multi-Sentence Compression
【24h】

An Efficient Approach for Multi-Sentence Compression

机译:一种高效的多句子压缩方法

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Multi Sentence Compression (MSC) is of great value to many real world applications, such as guided microblog summarization, opinion summarization and newswire summarization. Recently, word graph-based approaches have been proposed and become popular in MSC. Their key assumption is that redundancy among a set of related sentences provides a reliable way to generate informative and grammatical sentences. In this paper, we propose an effective approach to enhance the word graph-based MSC and tackle the issue that most of the state-of-the-art MSC approaches are confronted with: i.e., improving both informativity and grammaticality at the same time. Our approach consists of three main components: (1) a merging method based on Multiword Expressions (MWE); (2) a mapping strategy based on synonymy between words; (3) a re-ranking step to identify the best compression candidates generated using a POS-based language model (POS-LM). We demonstrate the effectiveness of this novel approach using a dataset made of clusters of English newswire sentences. The observed improvements on informativity and grammaticality of the generated compressions show an up to 44% error reduction over state-of-the-art MSC systems.
机译:多语句压缩(MSC)在许多现实世界的应用程序中具有重要价值,例如引导式微博摘要,意见摘要和新闻专线摘要。近来,已经提出了基于词图的方法并且在MSC中变得流行。他们的主要假设是,一组相关句子之间的冗余提供了一种生成信息性和语法性句子的可靠方法。在本文中,我们提出了一种有效的方法来增强基于词图的MSC,并解决大多数最先进的MSC方法所面临的问题:即同时提高信息性和语法性。我们的方法包括三个主要部分:(1)一种基于多字表达式(MWE)的合并方法; (2)基于词间同义的映射策略; (3)重新排序步骤,以识别使用基于POS的语言模型(POS-LM)生成的最佳压缩候选者。我们使用由英语新闻专栏句子组成的数据集证明了这种新颖方法的有效性。与最新的MSC系统相比,在生成的压缩信息和语法方面观察到的改进显示出最多减少了44%的错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号