首页> 外文会议>IEEE international conference on data engineering >STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream
【24h】

STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream

机译:StreamCube:在Twitter流中的事件探索的分层时空标签群集

获取原文
获取外文期刊封面目录资料

摘要

What is happening around the world? When and where? Mining the geo-tagged Twitter stream makes it possible to answer the above questions in real-time. Although a single tweet can be short and noisy, proper aggregations of tweets can provide meaningful results. In this paper, we focus on hierarchical spatio-temporal hashtag clustering techniques. Our system has the following features: (1) Exploring events (hashtag clusters) with different space granularity. Users can zoom in and out on maps to find out what is happening in a particular area. (2) Exploring events with different time granularity. Users can choose to see what is happening today or in the past week. (3) Efficient single-pass algorithm for event identification, which provides human-readable hashtag clusters. (4) Efficient event ranking which aims to find burst events and localized events given a particular region and time frame. To support aggregation with different space and time granularity, we propose a data structure called STREAMCUBE, which is an extension of the data cube structure from the database community with spatial and temporal hierarchy. To achieve high scalability, we propose a divide-and-conquer method to construct the STREAMCUBE. To support flexible event ranking with different weights, we proposed a top-k based index. Different efficient methods are used to speed up event similarity computations. Finally, we have conducted extensive experiments on a real twitter data. Experimental results show that our framework can provide meaningful results with high scalability.
机译:世界各地正在发生什么?何时何地?挖掘地理标记的Twitter流使得可以实时地回答上述问题。虽然单个推文可以短而嘈杂,但促进的适当聚合可以提供有意义的结果。在本文中,我们专注于分层时空HASHTAG聚类技术。我们的系统具有以下功能:(1)探索不同空间粒度的事件(Hashtag集群)。用户可以放大和换出地图,以了解特定区域发生的情况。 (2)探索不同时间粒度的事件。用户可以选择查看今天或过去一周发生的事情。 (3)有效的事件识别单通算法,其提供人类可读的HASHTAG集群。 (4)有效的事件排名,旨在找到特定区域和时间帧的突发事件和本地化事件。为了支持具有不同空间和时间粒度的聚合,我们提出了一种称为StreamCube的数据结构,该数据结构是数据库社区的数据多维数据集结构,其具有空间和时间层次结构。为了实现高可扩展性,我们提出了一种分行和征服方法来构建流速率。为了支持具有不同权重的灵活事件排名,我们提出了基于Top-K的索引。不同的有效方法用于加速事件相似性计算。最后,我们在真实的Twitter数据上进行了广泛的实验。实验结果表明,我们的框架可以提供高可扩展性的有意义的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号