...
首页> 外文期刊>SIGKDD explorations >SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds
【24h】

SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds

机译:SigniTrend:通过模糊重要性阈值可扩展地检测文本流中的新兴主题

获取原文
获取原文并翻译 | 示例
           

摘要

Social media such as Twitter or weblogs are a popular source for live textual data. Much of this popularity is due to the fast rate at which this data arrives, and there are a number of global events - such as the Arab Spring - where Twitter is reported to have had a major influence. However, existing methods for emerging topic detection are often only able to detect events of a global magnitude such as natural disasters or celebrity deaths, and can monitor user-selected keywords or operate on a curated set of hashtags only. Interesting emerging topics may, however, be of much smaller magnitude and may involve the combination of two or more words that themselves are not unusually hot at that time. Our contributions to the detection of emerging trends are threefold: first of all, we propose a significance measure that can be used to detect emerging topics early, long before they become "hot tags", by drawing upon experience from outlier detection. Secondly, by using hash tables in a heavy-hitters type algorithm for establishing a noise baseline, we show how to track even all keyword pairs using only a fixed amount of memory. Finally, we aggregate the detected co-trends into larger topics using clustering approaches, as often as a single event will cause multiple word combinations to trend at the same time.
机译:Twitter或Weblog等社交媒体是实时文本数据的流行来源。之所以如此受欢迎,很大程度上是由于该数据到达的速度很快,而且据报道,Twitter在全球事件中起了很大的影响,例如阿拉伯之春。但是,用于新兴主题检测的现有方法通常只能检测全球范围的事件,例如自然灾害或名人死亡,并且只能监视用户选择的关键字或仅对精选的主题标签进行操作。但是,有趣的新兴主题的规模可能要小得多,并且可能涉及两个或两个以上单词的组合,这些单词本身在当时并没有异常活跃。我们对检测新兴趋势的贡献有三方面:首先,我们提出了一项重要措施,可以利用异常检测中的经验,将其用于在新兴主题成为“热门标签”之前就及早发现。其次,通过在沉重打击者类型算法中使用哈希表建立噪声基线,我们展示了如何仅使用固定数量的内存来跟踪所有关键字对。最后,我们使用聚类方法将检测到的共同趋势汇总到更大的主题中,因为单个事件经常会导致同时出现多个单词组合的趋势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号