首页> 外文会议>International Symposium on Symbolic and Numeric Algorithms for Scientific Computing >Extractive Summarization using Cohesion Network Analysis and Submodular Set Functions
【24h】

Extractive Summarization using Cohesion Network Analysis and Submodular Set Functions

机译:采用凝聚力网络分析和子模具集功能的提取综准

获取原文

摘要

Numerous approaches have been introduced to automate the process of text summarization, but only few can be easily adapted to multiple languages. This paper introduces a multilingual text processing pipeline integrated in the open-source ReaderBench framework, which can be retrofit to cover more than 50 languages. While considering the extensibility of the approach and the problem of missing labeled data for training in various languages besides English, an unsupervised algorithm was preferred to perform extractive summarization (i.e., select the most representative sentences from the original document). Specifically, two different approaches relying on text cohesion were implemented: a) a graph-based text representation derived from Cohesion Network Analysis that extends TextRank, and b) a class of submodular set functions. Evaluations were performed on the DUC dataset and use as baseline the implementation of TextRank from Gensim. Our results using the submodular set functions outperform the baseline. In addition, two use cases on English and Romanian languages are presented, with corresponding graphical representations for the two methods.
机译:已经引入了许多方法以自动化文本摘要的过程,但只有很少可以容易地适应多种语言。本文介绍了一个多语言文本处理流水线,集成在开源读卡器框架中,可以改装超过50种语言。在考虑到方法的可扩展性和缺少标记数据的培训中缺少不同语言的数据的可能性之外,优选无监督算法来执行提取摘要(即,从原始文件中选择最具代表性的句子)。具体而言,实现了两种不同的方法,依赖于文本凝聚力:a)基于图形的文本表示,其源自延伸了Textrank,B)一类子模块集功能。在DUC数据集上进行评估,并用作基线从Gensim实现Textrank的实现。我们的结果使用子模块集功能优于基线。此外,呈现了两种关于英语和罗马尼亚语语言的用例,具有两种方法的相应图形表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号