首页> 外文会议>Advances in information retrieval >Summarizing a Document Stream
【24h】

Summarizing a Document Stream

机译:总结文档流

获取原文
获取原文并翻译 | 示例

摘要

We introduce the task of summarizing a stream of short documents on microblogs such as Twitter. On microblogs, thousands of short documents on a certain topic such as sports matches or TV dramas are posted by users. Noticeable characteristics of microblog data are that documents are often very highly redundant and aligned on timeline. There can be thousands of documents on one event in the topic. Two very similar documents will refer to two distinct events when the documents are temporally distant. We examine the microblog data to gain more understanding of those characteristics, and propose a summarization model for a stream of short documents on timeline, along with an approximate fast algorithm for generating summary. We empirically show that our model generates a good summary on the datasets of microblog documents on sports matches.
机译:我们介绍了总结微博(例如Twitter)上的简短文档流的任务。在微博上,用户发布了关于特定主题(例如体育比赛或电视剧)的数千个简短文档。微博数据的显着特征是文档通常是非常冗余的,并且在时间轴上对齐。一个主题中的一个事件可能有成千上万的文档。当两个文件在时间上遥远时,两个非常相似的文件将引用两个不同的事件。我们研究了微博数据,以更深入地了解这些特征,并提出了时间线上短文档流的汇总模型,以及用于生成摘要的近似快速算法。我们凭经验表明,我们的模型对体育比赛中微博文档的数据集产生了很好的总结。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号