首页> 外文会议>International Conference on Web Engineering >Tracking Dengue Epidemics Using Twitter Content Classification and Topic Modelling
【24h】

Tracking Dengue Epidemics Using Twitter Content Classification and Topic Modelling

机译:使用Twitter内容分类和主题建模跟踪登革热流行

获取原文

摘要

Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twitter, is increasingly being used for health vigilance applications such as flu detection. However, previous work has not addressed the complexity of drastic seasonal changes on Twitter content across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohesive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.
机译:在巴西和其他热带地区的蚊子和Zika等蚊子和Zika等蚊虫疾病的爆发长期以来一直是受影响地区政府的优先事项。流媒体社交媒体内容,如Twitter,越来越多地用于流感检测等健康警惕应用。然而,以前的工作没有解决跨多种流行病爆发的Twitter内容的复杂性。为了解决这一差距,该文件对检测与登革热爆发检测相关的Twitter内容的两个互补方法,即使用主题建模的监督分类和无监督聚类。每种方法都有福利和缺点。我们的分类器基于大约1,000个实例的小型训练集的预测精度约为80%,但手动注释的需要使其难以跟踪流行病的性质的季节性变化,例如新型病毒的出现在某些地理位置。相比之下,基于LDA的主题建模尺寸良好,从较大的样品产生粘性和分离良好的簇。虽然在流行病的变化之后可以轻松地重新生成群集,但是这种方法使得很难将相关的推文分成明确定义的群集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号