首页> 外文学位 >Text summarization using concept hierarchy.
【24h】

Text summarization using concept hierarchy.

机译:使用概念层次结构的文本摘要。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation aims to create new sentences to summarize text documents. In addition to generating new sentences, this project also generates new concepts and extracts key sentences to summarize documents. This project is the first research work that can generate new key concepts and can create new sentences to summarize documents.;Automatic document summarization is the process of creating a condensed version of the document. The condensed version extracts the key contents from the original document. Most related research uses statistical methods that generate a summary based on word distribution in the document. In this dissertation, we create a summary based on concept distributions and concept hierarchies. We use Stanford parser as our syntax parser and ResearchCyc (Cyc) as our knowledge base. Words and phrases of a document are mapped into Cyc concepts. We introduce a unique concept propagation method to generate abstract concepts and use those abstract concepts for the summarization. This method has two advantages over the existing methods. One advantage is the use of multi-level upward propagation to solve the word sense disambiguation problem. The other is that the propagation process provides a method to produce generalized concepts.;In the first part of the project, we generate a summary by extracting key concepts and key sentences from documents. We use Stanford parser to segment a document to sentences and to parse each sentence to words or phrases tagged with their part-of-speeches. We use Cyc commands to map those words and phrases to their corresponding Cyc concepts and increase the weights of those concepts. To handle word sense disambiguation and to create summarized concepts, we propagate the weight of the concepts upward along the Cyc concept hierarchy. Then, we extract the concepts with some of the highest weights to be the key concepts. To extract key sentences from the document, we weigh each sentence in the document based on the concept weight associated with the sentence. Then, we extract the sentences with some of the highest weights to summarize the document.;In the second part of the project, we generate new sentences to summarize a document based on the generalized concepts. First, we extract the subject, predicate, and object from each sentence. Then, we create compatible matrices based on the compatibility between the subjects, predicates, and objects among sentences. Two terms are considered to be compatible if the following three conditions hold: the two terms are the same concept, one concept is the other concept's immediate super class, or two concepts share the same immediate super class. From the compatible matrices, we build compatible clusters and finally generate new sentences for each compatible cluster. These newly generated sentences serve as a summary for the document.;We have implemented and tested our approaches. The test results show that our approaches are viable and have great potential for future research.
机译:本文旨在创造新的句子来总结文本文件。除了生成新句子外,该项目还生成新概念并提取关键句子以汇总文档。该项目是第一个可以产生新的关键概念并可以创建新句子以摘要文档的研究工作。自动文档摘要是创建文档精简版本的过程。精简版从原始文档中提取关键内容。大多数相关研究使用统计方法,这些方法根据文档中的单词分布生成摘要。本文基于概念分布和概念层次结构创建了一个摘要。我们使用斯坦福分析器作为语法分析器,并使用ResearchCyc(Cyc)作为我们的知识库。文档中的单词和短语会映射到Cyc概念中。我们引入了一种独特的概念传播方法来生成抽象概念,并将这些抽象概念用于汇总。与现有方法相比,该方法具有两个优点。一个优点是使用多级向上传播来解决单词义消歧问题。另一个是传播过程提供了一种产生广义概念的方法。在项目的第一部分,我们通过从文档中提取关键概念和关键句子来生成摘要。我们使用斯坦福解析器将文档分割为句子,然后将每个句子解析为带有词性标记的单词或短语。我们使用Cyc命令将这些单词和短语映射到其对应的Cyc概念,并增加这些概念的权重。为了处理单词义的歧义并创建概括的概念,我们沿Cyc概念层次结构向上传播概念的权重。然后,我们提取权重最高的概念作为关键概念。为了从文档中提取关键句子,我们根据与句子相关的概念权重对文档中的每个句子进行加权。然后,我们提取具有最高权重的句子以概括文档。在项目的第二部分中,我们基于广义概念生成新的句子以概括文档。首先,我们从每个句子中提取主语,谓语和宾语。然后,我们基于句子之间的主语,谓语和宾语之间的兼容性来创建兼容矩阵。如果满足以下三个条件,则两个术语被认为是兼容的:这两个术语是同一概念,一个概念是另一个概念的直接超类,或者两个概念共享相同的直接超类。从兼容矩阵中,我们构建了兼容群集,最后为每个兼容群集生成了新的句子。这些新生成的句子用作文档的摘要。;我们已经实现并测试了我们的方法。测试结果表明,我们的方法是可行的,并且在未来的研究中具有很大的潜力。

著录项

  • 作者

    Huang, Xiaomei.;

  • 作者单位

    Louisiana Tech University.;

  • 授予单位 Louisiana Tech University.;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 135 p.
  • 总页数 135
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号