【24h】

Hot Topic Detection Using Twitter Streaming Data

机译:使用Twitter流数据检测热点话题

获取原文

摘要

With the increasing popularity and widespread use of social networks, it is becoming increasingly beneficial to analyse the data being shared to identify topics of public interest and specific social phenomena. This paper analyses the possibilities, challenges and difficulties of automatic periodic collection and analysis of data from popular social networks and explains in detail the acquisition and analysis of data from Twitter. The paper explains an implementation of a simple hot topic detection algorithm based on texts acquired from the Twitter’s official API. The texts collected are being pre-processed by removing stop-words and stemming the remaining words using Porter’s stemming algorithm. Words from pre-processed text are assigned ranks depending on a large-scale analysis using TF-IDF weight and grouped into a hot topic. The algorithm accuracy was evaluated by comparison with Twitter’s official hot topic detection algorithm. Appropriate user interface enabling configuring the process of data acquisition, analysis and viewing results in a geographic fashion was implemented.
机译:随着社交网络的日益普及和广泛使用,分析共享的数据以识别公共利益和特定社会现象的主题变得越来越有益。本文分析了从流行的社交网络自动定期收集和分析数据的可能性,挑战和困难,并详细说明了Twitter数据的获取和分析。该论文说明了一种简单的热门话题检测算法的实现,该算法基于从Twitter官方API中获取的文本。通过删除停用词并使用Porter的词干算法来词干其余词,可以对所收集的文本进行预处理。使用TF-IDF权重进行的大规模分析,将来自预处理文本的单词分配等级,并分组为热门话题。通过与Twitter的官方热门话题检测算法进行比较,评估了算法的准确性。实现了适当的用户界面,该界面允许以地理方式配置数据获取,分析和查看结果的过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号