首页> 外文会议>International conference on multimedia modeling >Multi-document Summarization Exploiting Semantic Analysis Based on Tag Cluster
【24h】

Multi-document Summarization Exploiting Semantic Analysis Based on Tag Cluster

机译:基于标记集群的多文件摘要利用语义分析

获取原文

摘要

Multi-document summarization techniques aim to reduce the documents into a small set of words or paragraphs that convey the main meaning of the original documents. Many approaches for multi-document summarization have used probability based methods and machine learning techniques to summarize multiple documents sharing a common topic at the same time. However, these techniques fail to semantically analyze proper nouns and newly-coined words because most of them depend on old-fashioned dictionary or thesaurus. To overcome these drawbacks, we propose a novel multi-document summarization technique which employs the tag cluster on Flickr, a kind of folksonomy systems, for detecting key sentences from multiple documents. We first create a word frequency table for analyzing the semantics and contribution of words by using HITS algorithm. Then, by exploiting tag clusters, we analyze the semantic relationship between words in the word frequency table. The experimental results on TAC 2008, 2009 data sets demonstrate the improvement of our proposed framework over existing summarization systems.
机译:多文件摘要技术旨在将文档减少到一小组单词或段落中,以传达原始文件的主要含义。多文件摘要的许多方法都使用了基于概率的方法和机器学习技术来总结多个文件同时共享公共主题。然而,这些技术未能在语义上分析适当的名词和新创作的单词,因为大多数人都依赖于旧的字典或词库。为了克服这些缺点,我们提出了一种新颖的多文件摘要技术,它采用Flickr的标签集群,一种愚蠢的系统,用于检测来自多个文件的关键句子。我们首先通过使用命中算法来创建一个字频率表,用于分析语义和单词的贡献。然后,通过利用标签群集,我们分析了字频率表中的单词之间的语义关系。 2009年TAC 2009年数据集的实验结果证明了我们在现有摘要系统上提出拟议框架的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号