首页> 外文期刊>Future generation computer systems >Grouping sentences as better language unit for extractive text summarization
【24h】

Grouping sentences as better language unit for extractive text summarization

机译:将句子分组为提取文本摘要的更好语言单位

获取原文
获取原文并翻译 | 示例

摘要

Most existing methods for extractive text summarization aim to extract important sentences with statistical or linguistic techniques and concatenate these sentences as a summary. However, the extracted sentences are usually incoherent. The problem becomes worse when the source text and the summary are long and based on logical reasoning. The motivation of this paper is to answer the following two related questions: What is the best language unit for constructing a summary that is coherent and understandable? How is the extractive summarization process based on the language unit? Extracting larger language units such as a group of sentences or a paragraph is a natural way to improve the readability of summary as it is rational to assume that the original sentences within a larger language unit are coherent. This paper proposes a framework for group-based text summarization that clusters semantically related sentences into groups based on Semantic Link Network (SLN) and then ranks the groups and concatenates the top-ranked ones into a summary. A two-layer SLN model is used to generate and rank groups with semantic links including the is-part-of link, sequential link, similar-to link, and cause-effect link. The experimental results show that summaries composed by group or paragraph tend to contain more key words or phrases than summaries composed by sentences and summaries composed by groups contain more key words or phrases than those composed by paragraphs especially when the average length of source texts is from 7000 words to 17,000 words which is the usual length of scientific papers. Further, we compare seven clustering algorithms for generating groups and propose five strategies for generating groups with the four types of semantic links.
机译:最现有的提取文本摘要方法旨在利用统计或语言技术提取重要句子,并将这些句子连接为摘要。然而,提取的句子通常是不连贯的。当源文本和摘要很长并且基于逻辑推理时,问题变得更糟。本文的动机是回答以下两个相关问题:什么是用于构建连贯性和可理解的摘要的最佳语言单元?如何基于语言单元的提取摘要过程?提取诸如一组句子或段落之类的更大的语言单元是提高摘要可读性的自然方式,因为它是理性的,假设较大的语言单元中的原始句子是连贯的。本文提出了一种基于小组的文本摘要框架,将语义相关句子集群基于语义链接网络(SLN)进行语义相关句子,然后对组进行排列并将顶部排名的句子缩放到摘要中。双层SLN模型用于生成和等级组,其中语义链接包括链接,顺序链路,类似于链接和原因链接。实验结果表明,由组或段落组成的摘要倾向于包含比由句子组成的摘要和由组组成的摘要组成的摘要,其中组包含更多关键词或短语,而不是由段落组成的那些,特别是当源文本的平均长度来自7000个单词到17,000个单词,这是通常长度的科学论文。此外,我们比较七种聚类算法来生成组,并提出使用四种类型的语义链接生成组的五个策略。

著录项

  • 来源
    《Future generation computer systems》 |2020年第8期|331-359|共29页
  • 作者

    Mengyun Cao; Hai Zhuge;

  • 作者单位

    Key Lab of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences Beijing China School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing China School of Computer Science and Network Engineering Guangzhou University Guangzhou China School of Engineering and Applied Sciences Aston University Birmingham UK;

    Key Lab of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences Beijing China School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing China School of Computer Science and Network Engineering Guangzhou University Guangzhou China School of Engineering and Applied Sciences Aston University Birmingham UK;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Text summarization; Semantic Link Network; Clustering; Natural language processing;

    机译:文字摘要;语义链接网络;聚类;自然语言处理;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号