首页> 外文会议>IEEE International Conference on Semantic Computing >Predicting Community Engagement on Twitter on Environmental Health Hazards
【24h】

Predicting Community Engagement on Twitter on Environmental Health Hazards

机译:在Twitter上预测社区参与对环境健康的危害

获取原文

摘要

In this empirical study, a framework was developed for binary and multi-class classification of Twitter data. We first introduce a manually built gold standard dataset of 4000 tweets related to the environmental health hazards in Barbados for the period 2014 - 2018. Then, the binary classification was used to categorize each tweet as relevant or irrelevant. Next, the multiclass classification was then used to further classify relevant tweets into four types of community engagement: reporting information, expressing negative engagement, expressing positive engagement, and asking for information. Results indicate that (combination of TF-IDF, psychometric, linguistic, sentiment and Twitter-specific features) using a Random Forest algorithm is the best feature for detecting and predicting binary classification with (87% F1 score). For multi-class classification, TF-IDF using Decision Tree algorithm was the best with (74% F1 score).
机译:在这项实证研究中,开发了一个框架,用于Twitter数据的二进制和多类分类。我们首先介绍了2014年至2018年期间巴巴多斯与环境健康危害相关的4000条推文的手动建立的黄金标准数据集。然后,使用二进制分类将每条推文归为相关或不相关。接下来,使用多类分类将相关推文进一步分类为四种社区参与类型:报告信息,表达负面参与,表达正面参与和询问信息。结果表明,使用随机森林算法(TF-IDF,心理,语言,情感和Twitter特定功能的组合)是检测和预测(87 \%F1分数)二进制分类的最佳功能。对于多类别分类,使用决策树算法的TF-IDF效果最好(74%的F1分数)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号