首页> 外文学位 >An Integrated Summarization Framework with Hierarchical Content Representation.
【24h】

An Integrated Summarization Framework with Hierarchical Content Representation.

机译:具有分层内容表示的集成摘要框架。

获取原文
获取原文并翻译 | 示例

摘要

With the rapid growth of Internet services, more and more electronic text is accessible on-line. While the abundance of information provides more resources for individuals, it also results in the well-recognized information overload problem -- the excessive amount of information being provided. The technology of automatic text summarization has emerged to deal with this problem.;Automatic text summarization is the process of creating a shortened version of text by computational techniques to help users catch the important content of the original text(s) with affordable time costs. According to the ways of summary composition, there are extractive summarization methods and abstractive summarization methods. Currently, extractive methods are the mainstream, which will be the focus in this dissertation.;The main question to be answered in extractive summarization is how to select a set of sentences from the input documents to form a summary that can best convey the important content of the input documents. Setting off by discovering important words in the input documents to answer the question, we propose several content models for word saliency estimation and word-based sentence ranking and then develop two word-based summarization methods with the content models. Experimental results prove the effectiveness of the proposed methods applied to several authoritative data sets from the Document Understanding Conference (DUC) tasks. Our next target is to incorporate the relations between important words into the summarizing process. We propose several methods to identify the latent word relations in the input documents and use them to obtain a hierarchical representation of the document content. Based on the hierarchical content representation, we propose a novel hierarchical summarization method that follows the general-to-specific style to extract summary sentences. Unsystematically studied in previous researches, hierarchical summarization is characterized by integrating various summarization objectives to simultaneously improve the content and readability of the composed summaries. The experimental results on the DUC data sets prove the advantages of the proposed method over traditional summarization methods. Finally, we conduct several tentative studies to examine the use of more sophisticated content representations beyond single words for improving the hierarchical summarization method. The tentative studies capture several important details in developing good hierarchical summarization methods and shed light on the directions of future work in hierarchical summarization.
机译:随着Internet服务的迅速发展,越来越多的电子文本可以在线访问。信息的丰富为个人提供了更多的资源,但同时也导致了公认的信息过载问题-提供了过多的信息。出现了自动文本摘要技术来解决这个问题。自动文本摘要是通过计算技术来创建文本缩短版本的过程,以帮助用户以可承受的时间成本捕获原始文本的重要内容。根据摘要构成的方式,有提取摘要方法和抽象摘要方法。当前,抽取方法是主流,这将是本文的重点。抽取摘要中要回答的主要问题是如何从输入文档中选择一组句子以形成最能传达重要内容的摘要输入文档。通过在输入文档中发现重要的单词来回答这个问题,我们提出了几种内容模型,以进行单词显着性估计和基于单词的句子排名,然后用这些内容模型开发两种基于单词的摘要方法。实验结果证明了该方法在文档理解会议(DUC)任务中应用于多个权威数据集的有效性。我们的下一个目标是将重要词之间的关系纳入总结过程。我们提出了几种方法来识别输入文档中的潜在单词关系,并使用它们来获得文档内容的层次表示。基于分层内容表示,我们提出了一种新颖的分层摘要方法,该方法遵循一般到特定样式来提取摘要语句。在以前的研究中,没有进行过系统的研究,分层汇总的特征是集成了各种汇总目标,以同时提高编写摘要的内容和可读性。在DUC数据集上的实验结果证明了该方法相对于传统汇总方法的优势。最后,我们进行了一些尝试性研究,以研究使用单个单词以外的更复杂的内容表示形式来改进分层汇总方法。初步研究捕获了开发良好的层次汇总方法的几个重要细节,并阐明了层次汇总中未来工作的方向。

著录项

  • 作者

    Ouyang, You.;

  • 作者单位

    Hong Kong Polytechnic University (Hong Kong).;

  • 授予单位 Hong Kong Polytechnic University (Hong Kong).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 172 p.
  • 总页数 172
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号