首页> 外文会议>International Joint Conference on Computer Science and Software Engineering >Online Emerging Topic Detection on Twitter Using Random Forest with Stock Indicator Features
【24h】

Online Emerging Topic Detection on Twitter Using Random Forest with Stock Indicator Features

机译:使用具有股票指标功能的随机森林在Twitter上进行在线新兴主题检测

获取原文

摘要

Social media is one of the most impactful and fastest communication methods. By monitoring Twitter streams, we are able to detect emerging topics and understand events around the world. There are some prior attempts that aim to online detect topics on Twitter. However, they can only detect bursty topics by using user-defined keywords a long with simple rules. In this paper, we propose an algorithm to detect emerging topics on Twitter streams. To detect emerging topics, a clustering technique has been applied to aggregate a set of keywords. Since an emerging topic occurs continuously, the emerging topics are merged with stateful technique to accumulate topics from different time intervals. To detect both high signal topics and small-medium signal topics, we use statistical features based on average, acceleration, and z-score. Moreover, we propose to include the stock indicator features: Relative Strength Index (RSI) and Stochastic Oscillator (STOCH). They are common features in trend (oversold and overbought) detection in stock analysis which is similar to our topic detection in twitter. To capture any event patterns, Random Forest (RF) has been proposed as a classifier to detect emerging keywords by utilizing the stated above five features. To evaluate the performance, we created and published a corpus by collecting Twitter data for 10 days with over 80 million tweets and then labeling possible topics in total161 events along with related keywords. The experiment was conducted on our collected data. The F1-results show that our model outperforms all baselines: TwitterMonitor, SigniTrend, and TopicSketch, in terms of detected keywords and topics.
机译:社交媒体是最具影响力和最快的沟通方式之一。通过监视Twitter流,我们能够检测到新兴话题并了解世界各地的事件。有一些先前的尝试旨在在Twitter上在线检测主题。但是,他们只能通过长期使用简单规则的用户定义关键字来检测突发主题。在本文中,我们提出了一种算法来检测Twitter流中的新兴主题。为了检测新兴主题,已应用聚类技术来聚合一组关键字。由于新兴主题不断发生,因此新兴主题与有状态技术合并以积累来自不同时间间隔的主题。为了检测高信号主题和中信号主题,我们使用基于平均值,加速度和z得分的统计功能。此外,我们建议包括股票指标功能:相对强度指数(RSI)和随机震荡指标(STOCH)。它们是股票分析中趋势(超卖和超买)检测的常见功能,类似于我们在Twitter中的主题检测。为了捕获任何事件模式,已提出使用随机森林(RF)作为分类器,以利用上述五个功能来检测新兴的关键字。为了评估性能,我们创建并发布了一个语料库,方法是收集Twitter数据10天,收集超过8000万条推文,然后在总共161个事件中标记可能的主题以及相关的关键字。实验是根据我们收集的数据进行的。 F1的结果表明,在检测到的关键字和主题方面,我们的模型优于所有基线:TwitterMonitor,SigniTrend和TopicSketch。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号