首页> 外文期刊>Information Processing & Management >An abstractive Arabic text summarizer with user controlled granularity
【24h】

An abstractive Arabic text summarizer with user controlled granularity

机译:具有用户控制的粒度的抽象阿拉伯文本摘要器

获取原文
获取原文并翻译 | 示例
       

摘要

Automated summaries help tackle the ever growing volume of information floating around. There are two broad categories: extract and abstract. In the former we retain the more important sentences more or less in their original structure, while the latter requires a fusion of multiple sentences and/or paraphrasing. This is a more challenging task than extract summaries. In this paper, we present a novel generic abstract summarizer for a single document in Arabic language. The system starts by segmenting the input text topic wise. Then, each textual segment is extractively summarized. Finally, we apply rule-based sentence reduction technique. The RST-based extractive summarizer is an enhanced version of the system in Azmi and Al-Thanyyan (2012). By controlling the size of the extract summary of each segment we can cap the size of the final abstractive summary. Both summarizers, the enhanced extractive and the abstractive, were evaluated. We tested our enhanced extractive summarizer on the same dataset in the aforementioned paper, using the measures recall, precision and ROUGE. The results show noticeable improvement in the performance, specially the precision in shorter summaries. The abstractive summarizer was tested on a set of 150 documents, generating summaries of sizes 50%, 40%, 30% and 20% (of the original's word count). The results were assessed by two human experts who graded them out of a maximum score of 5. The average score ranged between 4.53 and 1.92 for summaries at different granularities, with shorter summaries receiving the lower score. The experimental results are encouraging and demonstrate the effectiveness of our approach.
机译:自动化的摘要有助于解决不断增加的信息量。有两大类:提取和抽象。在前者中,我们将较重要的句子或多或少保留其原始结构,而后者则需要将多个句子和/或措辞进行融合。这比提取摘要更具挑战性。在本文中,我们为阿拉伯语的单个文档提供了一种新颖的通用摘要摘要。系统首先将输入文本主题进行细分。然后,提取每个文本段。最后,我们应用基于规则的句子减少技术。基于RST的提取摘要器是Azmi和Al-Thanyyan(2012)中系统的增强版本。通过控制每个段的摘要摘要的大小,我们可以限制最终抽象摘要的大小。评估了两个摘要器(增强的摘要和摘要)。我们使用召回率,精度和ROUGE度量在上述论文的同一数据集中测试了增强的提取摘要器。结果表明,性能显着提高,尤其是较短摘要的精度。摘要摘要器在150个文档上进行了测试,生成的摘要的大小为原始单词数的50%,40%,30%和20%。两位人类专家对结果进行了评估,他们对满分为5分进行了评分。不同粒度的摘要的平均分数在4.53至1.92之间,较短的摘要的分数较低。实验结果令人鼓舞,并证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号