首页> 外文期刊>Natural language engineering >Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem
【24h】

Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem

机译:基于背包问题的文本蕴涵和句子压缩的提取式多文档摘要

获取原文
获取原文并翻译 | 示例
       

摘要

By increasing the amount of data in computer networks, searching and finding suitable information will be harder for users. One of the most widespread forms of information on such networks are textual documents. So exploring these documents to get information about their content is difficult and sometimes impossible. Multi-document text summarization systems are an aid to producing a summary with a fixed and predefined length, while covering the maximum content of the input documents. This paper presents a novel method for multi-document extractive summarization based on textual entailment relations and sentence compression via formulating the problem as a knapsack problem. In this approach, sentences of documents are ranked according to the extended Tf-Idf method, then entailment scores of selected sentences are computed. Through these scores, the final score of each sentence is calculated. Finally, by decreasing the lengths of sentences via sentence compression, the problem has been solved by greedy and dynamic Programming approaches to the knapsack problem. Experiments on standard summarization datasets and evaluating the results based on the Rouge system show that the suggested method, according to the best of our knowledge, has increased F-measure of query-based summarization systems by two per cent and F-measure of general summarization systems by five per cent.
机译:通过增加计算机网络中的数据量,对于用户来说搜索和找到合适的信息将变得更加困难。在此类网络上,最广泛的信息形式之一是文本文档。因此,探索这些文档以获取有关其内容的信息非常困难,有时甚至是不可能的。多文档文本摘要系统有助于生成具有固定和预定义长度的摘要,同时覆盖输入文档的最大内容。通过将问题表述为背包问题,提出了一种基于文本包含关系和句子压缩的多文档提取摘要方法。在这种方法中,根据扩展的Tf-Idf方法对文档的句子进行排名,然后计算所选句子的包含分数。通过这些分数,可以计算出每个句子的最终分数。最后,通过句子压缩减少句子的长度,该问题已通过贪婪和动态规划方法解决了背包问题。在标准摘要数据集上进行的实验以及基于Rouge系统的评估结果表明,根据我们的知识,所建议的方法将基于查询的摘要系统的F度量提高了2%,将一般摘要的F度量提高了系统降低了5%。

著录项

  • 来源
    《Natural language engineering》 |2019年第1期|121-146|共26页
  • 作者单位

    Shahid Bahonar Univ Kerman, Fac Math & Comp, Dept Appl Math, Kerman, Iran;

    Shahid Bahonar Univ Kerman, Fac Math & Comp, Dept Comp Sci, Kerman, Iran;

    Shahid Bahonar Univ Kerman, Fac Math & Comp, Dept Comp Sci, Kerman, Iran;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-18 04:14:10

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号