首页> 外文学位 >Twitter Analytics: Geotag Imputation, Forecasting, and Dynamic Variable Selection
【24h】

Twitter Analytics: Geotag Imputation, Forecasting, and Dynamic Variable Selection

机译:Twitter Analytics:地理标记插补,预测和动态变量选择

获取原文
获取原文并翻译 | 示例

摘要

The popularity of social media has created vast repositories of open source data with broad potential value. Researchers are actively mining these new complex data sources to create predictive models for wide-ranging applications. For example, Wikipedia is used to forecast influenza in the United States [Hickmann et al., 2015], Facebook is used for more effective advertising [Backstrom et al., 2010], and Twitter is used to forecast civil unrest in Latin America [Korkmaz et al., 2015]. In this dissertation, we create statistical methodology advancing the analytical value of Twitter.;We begin in Chapter 2 by developing a geotag imputation method to predict the origin of individual tweets. Standard practice uses either the content of the tweet, network information, or these two features independently to estimate the origin. We show improved accuracy by using both tweet text and user network information jointly. Moreover, we properly account for uncertainty, improving both precision and coverage of geotag imputation.;In Chapter 3 we focus on short term forecasting using daily word counts as model features scraped from Twitter. Conventional forecasting models in the area of social media are typically static, and therefore, researchers assume time invariant data. We consider a dynamic approach to account for possible time dependencies, which allows the forecasting model to evolve in time along with the data generating process. For the problem of civil unrest, we use dynamic logistic regression to forecast the probability of protest in Latin America and show improved accuracy compared to the static baseline model. Furthermore, we develop a dynamic variable selection technique based on penalized credible regions in order to contextualize the reasons for protest. The proposed methodology is scalable and outperforms the current baseline.;In Chapter 4, we combine the geotag imputation and dynamic model methodology of the previous chapters. This final project is a first step in using tweets with imputed geotags within geographic-specific forecast models. The goal is to understand the impact of measurement error due to the location uncertainty of tweets.
机译:社交媒体的流行已经创建了具有广泛潜在价值的大量开源数据存储库。研究人员正在积极挖掘这些新的复杂数据源,以创建适用于广泛应用的预测模型。例如,维基百科用于预测美国的流感[Hickmann等,2015],Facebook用于更有效的广告[Backstrom等,2010],而Twitter用于预测拉丁美洲的内乱[ Korkmaz et al。,2015]。在本文中,我们创建了统计方法,以提高Twitter的分析价值。我们从第2章开始,通过开发地理标记插补方法来预测单个推文的来源。标准实践使用推文的内容,网络信息或这两个功能独立地估计来源。通过共同使用推文和用户网络信息,我们显示出更高的准确性。此外,我们适当地考虑了不确定性,提高了地理标记插补的准确性和覆盖范围。在第3章中,我们着重于短期预测,即使用每日字数作为从Twitter抓取的模型特征。社交媒体领域的常规预测模型通常是静态的,因此,研究人员假设时不变的数据。我们考虑一种动态方法来考虑可能的时间依赖性,该方法允许预测模型随数据生成过程随时间变化。对于内乱问题,我们使用动态逻辑回归来预测拉丁美洲发生抗议的可能性,并且与静态基线模型相比,其准确性更高。此外,我们开发了一种基于受惩罚的可信区域的动态变量选择技术,以便根据情节说明抗议的原因。所提出的方法是可扩展的,并且优于当前的基准。在第四章​​中,我们结合了前几章的地理标记插补和动态模型方法。此最终项目是在特定于地理的预测模型中使用带有推算地理标签的推文的第一步。目的是了解由于推文的位置不确定性导致的测量误差的影响。

著录项

  • 作者

    Bakerman, Jordan.;

  • 作者单位

    North Carolina State University.;

  • 授予单位 North Carolina State University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号