首页> 外文会议>International Conference on Social Informatics >Predicting Poll Trends Using Twitter and Multivariate Time-Series Classification
【24h】

Predicting Poll Trends Using Twitter and Multivariate Time-Series Classification

机译:使用Twitter和多变量时间序列分类预测轮询趋势

获取原文

摘要

Social media outlets, such as Twitter, provide invaluable information for understanding the social and political climate surrounding particular issues. Millions of people who vary in age, social class, and political beliefs come together in conversation. However, this information poses challenges to making inferences from these tweets. Using the tweets from the 2016 U.S. Presidential campaign, one main research question is addressed in this work. That is, can accurate predictions be made detecting changes in a political candidate's poll score trends utilizing tweets created during their campaign? The novelty of this work is that we formulate the problem as a multivariate time-series classification problem, which fits the temporal nature of tweets, rather than as a traditional attribute-based classification. Features that represent various aspects of support for (or against) a candidate are tracked on an hour-by-hour basis. Together these form multivariate time-series. One commonly used approach to this problem is based on the majority voting scheme. This method assumes the univariate time-series from different features have equal importance. To alleviate this issue a weighted shapelet transformation model is proposed. Extensive experiments on over 12 million tweets between November 2015 and January 2016 related to the four primary candidates (Bernie Sanders, Hillary Clinton, Donald Trump and Ted Cruz) indicate that the multivariate time-series approach outperforms traditional attribute-based approaches.
机译:社交媒体网点(如Twitter)提供了了解围绕特定问题的社会和政治气候的宝贵信息。数百万人因年龄而异,社会阶层和政治信仰而变化在一起。但是,此信息造成挑战,以便从这些推文中推断出来。使用来自2016年美国总统活动的推文,在这项工作中解决了一个主要的研究问题。也就是说,可以准确的预测在他们的运动期间创造的推文来检测政治候选人的民意调查评分趋势的变化?这项工作的新颖之处在于我们将问题作为多变量时间序列分类问题,这适合推文的时间性,而不是作为基于传统的属性的分类。代表(或反对)候选人的各个方面的特征是按小时按小时跟踪的。这些形式的多变量时间系列一起。这个问题的一个常用方法是基于大多数投票方案。该方法假设来自不同特征的单变量时间序列具有相同的重要性。为了缓解这一问题,提出了一种加权的Shain转换模型。 2015年11月和2016年1月在与四个主要候选人(Bernie Sanders,Hillary Clinton,Donald Trump和Ted Cruz)之间有超过1200万推文的大量实验表明,多变量时间序列方法优于传统的基于属性的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号