首页> 外文学位 >Semantic Analysis for Improved Multi-document Summarization of Text.
【24h】

Semantic Analysis for Improved Multi-document Summarization of Text.

机译:改进的多文档文本摘要的语义分析。

获取原文
获取原文并翻译 | 示例

摘要

Excess amount of unstructured data is easily accessible in digital format. This information overload places too heavy a burden on society for its analysis and execution needs. Focused (i.e. topic, query, question, category, etc.) multi-document summarization is an information reduction solution which has reached a state-of-the-art that now demands the need to further explore other techniques to model human summarization activity. Such techniques have been mainly extractive and rely on distribution and complex machine learning on corpora in order to perform closely to human summaries. Overall, these techniques are still being used, and the field now needs to move toward more abstractive approaches to model human way of summarizing. A simple, inexpensive and domain-independent system architecture is created for adding semantic analysis to the summarization process. The proposed system is novel in its use of a new semantic analysis metric to better score sentences for selection into a summary. It also simplifies semantic processing of sentences to better capture more likely semantic-related information, reduce redundancy and reduce complexity. The system is evaluated against participants in the Document Understanding Conference and the later Text Analysis Conference using the performance ROUGE measures of n-gram recall between automated systems, human and baseline gold standard baseline summaries. The goal was to show that semantic analysis used for summarization can perform well, while remaining simple and inexpensive without significant loss of recall as compared to the foundational baseline system. Current results show improvement over the gold standard baseline when all factors of this work's semantic analysis technique are used in combination. These factors are the semantic cue words feature and semantic class weighting to determine sentences with important information. Also, the semantic triples clustering used to decompose natural language sentences to their most basic meaning and select the most important sentences added to this improvement. In competition against the gold standard baseline system on the standardized summarization evaluation metric ROUGE, this work outperforms the baseline system by more than ten position rankings. This work shows that semantic analysis and light-weight, open-domain techniques have potential.
机译:大量的非结构化数据很容易以数字格式访问。这种信息过载给社会带来了沉重的负担,无法满足其分析和执行需求。重点突出的(即主题,查询,问题,类别等)多文档摘要是一种信息缩减解决方案,已达到最新水平,现在需要进一步探索其他技术来模拟人类摘要活动。此类技术主要是提取性的,并且依赖于语料库上的分布和复杂的机器学习,以便与人类的摘要紧密结合。总体而言,这些技术仍在使用,并且该领域现在需要朝着更抽象的方法发展,以模拟人类的总结方式。创建了一种简单,廉价且独立于域的系统体系结构,以将语义分析添加到摘要过程中。所提出的系统是新颖的,其使用新的语义分析度量来更好地评分句子以供选择为摘要。它还简化了句子的语义处理,以更好地捕获与语义相关的信息,减少冗余并降低复杂性。针对文档理解会议和以后的文本分析会议的参与者,使用自动系统之间的n-gram召回性能ROUGE度量,人员和基准金标准基准摘要对系统进行了评估。目的是表明用于汇总的语义分析可以很好地执行,同时与基础基线系统相比,在保持简单和廉价的同时又不会显着降低召回率。当前的结果表明,当结合使用这项工作的语义分析技术的所有因素时,其对金本位制基准的改进。这些因素是语义提示词的特征和确定具有重要信息的句子的语义类加权。同样,语义三元组聚类用于将自然语言句子分解为它们的最基本含义,并选择添加到此改进中的最重要的句子。在标准化摘要评估指标ROUGE上与黄金标准基准系统竞争时,这项工作在十多个排名上均优于基准系统。这项工作表明语义分析和轻量级的开放域技术具有潜力。

著录项

  • 作者

    Israel, Quinsulon L.;

  • 作者单位

    Drexel University.;

  • 授予单位 Drexel University.;
  • 学科 Computer science.;Artificial intelligence.;Information Technology.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 135 p.
  • 总页数 135
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:53:33

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号