首页> 外文会议>International Conference on Web Information Systems Engineering >Efficient Online Novelty Detection in News Streams
【24h】

Efficient Online Novelty Detection in News Streams

机译:高效在线新奇新奇检测新闻溪流

获取原文

摘要

Novelty detection in text streams is a challenging task that emerges in quite a few different scenarii, ranging from email threads to RSS news feeds on a cell phone. An efficient novelty detection algorithm can save the user a great deal of time when accessing interesting information. Most of the recent research for the detection of novel documents in text streams uses either geometric distances or distributional similarities with the former typically performing better but being slower as we need to compare an incoming document with all the previously seen ones. In this paper, we propose a new novelty detection algorithm based on the Inverse Document Frequency (IDF) scoring function. Computing novelty based on IDF enables us to avoid similarity comparisons with previous documents in the text stream, thus leading to faster execution times. At the same time, our proposed approach outperforms several commonly used baselines when applied on a real-world news articles dataset.
机译:文本流中的新颖性检测是一个具有挑战性的任务,它在相当不同的场景中出现,从电子邮件线程到RSS新闻源给RSS新闻。有效的新奇检测算法可以在访问有趣信息时节省大量时间。最近用于检测文本流中的新文档的大多数研究使用几何距离或与前者的分配相似度,通常更好地执行,但随着我们需要将传入文档与所有先前看到的,可以更慢地执行。在本文中,我们提出了一种基于逆文档频率(IDF)评分函数的新型新型检测算法。基于IDF的计算新颖性使我们能够避免在文本流中与先前文档的相似性比较,从而导致更快的执行时间。与此同时,我们提出的方法在应用于现实世界新闻文章数据集时占多种常用的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号