首页> 外文会议>International conference on multimedia modeling >Multi-document Summarization Exploiting Semantic Analysis Based on Tag Cluster
【24h】

Multi-document Summarization Exploiting Semantic Analysis Based on Tag Cluster

机译:基于标签聚类的多文档摘要开发语义分析

获取原文

摘要

Multi-document summarization techniques aim to reduce the documents into a small set of words or paragraphs that convey the main meaning of the original documents. Many approaches for multi-document summarization have used probability based methods and machine learning techniques to summarize multiple documents sharing a common topic at the same time. However, these techniques fail to semantically analyze proper nouns and newly-coined words because most of them depend on old-fashioned dictionary or thesaurus. To overcome these drawbacks, we propose a novel multi-document summarization technique which employs the tag cluster on Flickr, a kind of folksonomy systems, for detecting key sentences from multiple documents. We first create a word frequency table for analyzing the semantics and contribution of words by using HITS algorithm. Then, by exploiting tag clusters, we analyze the semantic relationship between words in the word frequency table. The experimental results on TAC 2008, 2009 data sets demonstrate the improvement of our proposed framework over existing summarization systems.
机译:多文档摘要技术旨在将文档简化为传达原始文档主要含义的一小部分单词或段落。用于多文档摘要的许多方法已使用基于概率的方法和机器学习技术来汇总同时共享同一主题的多个文档。但是,这些技术无法从语义上分析专有名词和新近产生的词,因为它们中的大多数都依赖于老式的词典或同义词库。为了克服这些缺点,我们提出了一种新颖的多文档摘要技术,该技术采用了Flickr(一种民俗分类系统)上的标签簇,用于从多个文档中检测关键句子。我们首先使用HITS算法创建一个词频表来分析词的语义和贡献。然后,通过利用标签簇,我们分析了词频表中词之间的语义关系。在TAC 2008、2009数据集上的实验结果表明,与现有的摘要系统相比,我们提出的框架有所改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号