...
首页> 外文期刊>Data mining and knowledge discovery >Efficient temporal mining of micro-blog texts and its application to event discovery
【24h】

Efficient temporal mining of micro-blog texts and its application to event discovery

机译:微博文本的有效时间挖掘及其在事件发现中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper we present a novel method for clustering words in micro-blogs, based on the similarity of the related temporal series. Our technique, named SAX*, uses the Symbolic Aggregate ApproXimation algorithm to discretize the temporal series of terms into a small set of levels, leading to a string for each. We then define a subset of "interesting" strings, i.e. those representing patterns of collective attention. Sliding temporal windows are used to detect co-occurring clusters of tokens with the same or similar string. To assess the performance of the method we first tune the model parameters on a 2-month 1 % Twitter stream, during which a number of world-wide events of differing type and duration (sports, politics, disasters, health, and celebrities) occurred. Then, we evaluate the quality of all discovered events in a 1-year stream, "googling" with the most frequent cluster n-grams and manually assessing how many clusters correspond to published news in the same temporal slot. Finally, we perform a complexity evaluation and we compare SAX* with three alternative methods for event discovery. Our evaluation shows that SAX* is at least one order of magnitude less complex than other temporal and non-temporal approaches to micro-blog clustering.
机译:在本文中,我们基于相关时间序列的相似性,提出了一种在微博中对单词进行聚类的新方法。我们的技术名为SAX *,它使用符号聚合近似算法将术语的时间序列离散为一小部分级别,从而为每个级别生成一个字符串。然后,我们定义“有趣”字符串的子集,即表示集体关注模式的那些子集。滑动时间窗口用于检测具有相同或相似字符串的令牌的共现簇。为了评估该方法的性能,我们首先在为期两个月的1%Twitter流中调整模型参数,在此期间发生了许多不同类型和持续时间的全球性事件(体育,政治,灾难,健康和名人) 。然后,我们评估1年流中所有发现事件的质量,“搜索”最频繁的簇n-gram,并手动评估在同一时间段内对应于已发布新闻的簇数。最后,我们执行复杂度评估,并将SAX *与三种其他方法进行事件发现进行比较。我们的评估表明,SAX *的复杂度至少比其他时间和非时间的微博客聚类方法低一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号