首页> 外文会议>International conference on neural information processing;ICONIP 2011 >News Thread Extraction Based on Topical N-Gram Model with a Background Distribution
【24h】

News Thread Extraction Based on Topical N-Gram Model with a Background Distribution

机译:基于具有背景分布的局部N-Gram模型的新闻线索提取

获取原文

摘要

Automatic thread extraction for news events can help people know different aspects of a news event. In this paper, we present a method of extraction using a topical N-gram model with a background distribution (TNB). Unlike most topic models, such as Latent Dirich-let Allocation (LDA), which relies on the bag-of-words assumption, our model treats words in their textual order. Each news report is represented as a combination of a background distribution over the corpus and a mixture distribution over hidden news threads. Thus our model can model "presidential election" of different years as a background phrase and "Obama wins" as a thread for event "2008 USA presidential election". We apply our method on two different corpora. Evaluation based on human judgment shows that the model can generate meaningful and interpretable threads from a news corpus.
机译:新闻事件的自动线程提取可以帮助人们了解新闻事件的不同方面。在本文中,我们提出了一种使用具有背景分布(TNB)的局部N-gram模型进行提取的方法。与大多数主题模型不同,例如依赖于词袋假设的潜在Dirich-let分配(LDA),我们的模型以文本顺序对待词。每个新闻报道都表示为语料库上的背景分布和隐藏新闻线程上的混合分布的组合。因此,我们的模型可以将不同年份的“总统选举”作为背景短语,将“奥巴马胜出”作为事件“ 2008美国总统选举”的主题。我们将我们的方法应用于两个不同的语料库。基于人类判断力的评估表明,该模型可以从新闻语料库中生成有意义且可解释的线索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号