首页> 外文会议>ACM international conference on information and knowledge management >Temporal Corpus Summarization Using Submodular Word Coverage
【24h】

Temporal Corpus Summarization Using Submodular Word Coverage

机译:使用次模量词覆盖率的时态语料库摘要

获取原文

摘要

In many areas of life, we now have almost complete electronic archives reaching back for well over two decades. This includes, for example, the body of research papers in computer science, all news articles written in the US, and most people's personal email. However, we have only rather limited methods for analyzing and understanding these collections. While keyword-based retrieval systems allow efficient access to individual documents in archives, we still lack methods for understanding a corpus as a whole. In this paper, we explore methods that provide a temporal summary of such corpora in terms of landmark documents, authors, and topics. In particular, we explicitly model the temporal nature of influence between documents and re-interpret summarization as a coverage problem over words anchored in time. The resulting models provide monotone sub-modular objectives for computing informative and non-redundant summaries over time, which can be efficiently optimized with greedy algorithms. Our empirical study shows the effectiveness of our approach over several baselines.
机译:在生活的许多领域,我们现在拥有几乎完整的电子档案,可以追溯到过去的二十多年。例如,这包括计算机科学方面的研究论文,在美国撰写的所有新闻文章以及大多数人的个人电子邮件。但是,我们仅有有限的方法来分析和理解这些集合。尽管基于关键字的检索系统可以有效地访问档案中的单个文档,但我们仍然缺乏用于理解整个语料库的方法。在本文中,我们探索了根据地标文档,作者和主题提供此类语料库的时间摘要的方法。特别是,我们显式地对文档之间影响的时间性质进行建模,并将摘要重新解释为对时间锚定单词的覆盖问题。生成的模型提供了用于计算随时间推移的信息性和非冗余性汇总的单调子模块化目标,可以使用贪婪算法对其进行有效优化。我们的实证研究表明,我们的方法在多个基线上都是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号