首页> 外文会议>IEEE International Conference on Awareness Science and Technology >Design of a method to support Twitter based event detection with heterogeneous data resources
【24h】

Design of a method to support Twitter based event detection with heterogeneous data resources

机译:一种支持使用异构数据资源的基于Twitter的事件检测的方法的设计

获取原文

摘要

There is a high demand for observation of events of public concern in a real time manner by analyzing Big Data. Twitter is a suitable data resource for event detection due to amount of data/users in the Twitter system, and high frequency of data generation. The possibility of event detection by tweets has been proved by a lot of researches. However it still has the following two problems. The first problem is the reliability of information, since tweets are always very noisy and fake information appears in them. The second problem is the lack of enough information for each tweet. It is because a tweet is restricted to 140 letters, so that it can not describe much information. One possible solution is to retrieve additional information, which is related to a Twitter based event detection result, from heterogeneous data resources such as articles, Web Pages, blog posts etc. If the information is retrieved, it can be used to validate the detection result and also provide as further information to enhance the detection result. However properly retrieving related contents from heterogeneous data resources is not easy because of different types of data. To solve the above problem, we propose a method to retrieve additional information related to a set of tweets, which is detected as an event, from heterogeneous data resources by measuring similarity (distance) between them with Normalized Compression Distance. We mainly consider articles in the web as the additional information for Twitter based event detection, since they are well validated and edited. We evaluate the proposed method in experiments, and the results show that it has high anti-noise capability and performs well in practical situation.
机译:通过分析大数据来实时观察公众关注事件的需求很高。由于Twitter系统中的数据/用户数量众多,并且数据生成频率很高,因此Twitter是用于事件检测的合适数据资源。大量研究证明了通过推文进行事件检测的可能性。但是,它仍然存在以下两个问题。第一个问题是信息的可靠性,因为推文总是非常嘈杂,并且伪造的信息会出现在其中。第二个问题是每个推文都缺少足够的信息。这是因为一条推文仅限于140个字母,因此它不能描述太多信息。一种可能的解决方案是从诸如文章,网页,博客文章等的异构数据资源中检索与基于Twitter的事件检测结果有关的其他信息。如果检索到该信息,则可以用来验证检测结果。并提供进一步的信息以增强检测结果。但是,由于数据类型不同,从异构数据资源中正确检索相关内容并不容易。为了解决上述问题,我们提出了一种方法,该方法通过使用归一化压缩距离测量异构数据资源之间的相似度(距离),从异类数据资源中检索与一组tweet相关的附加信息,该信息被检测为事件。我们主要将Web上的文章视为基于Twitter的事件检测的附加信息,因为它们经过了很好的验证和编辑。通过实验验证了该方法的有效性,结果表明该方法具有较高的抗噪能力,在实际应用中效果良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号