首页> 外文期刊>Decision support systems >Kernel-based features for predicting population health indices from geocoded social media data
【24h】

Kernel-based features for predicting population health indices from geocoded social media data

机译:基于内核的功能可通过地理编码的社交媒体数据预测人口健康指数

获取原文
获取原文并翻译 | 示例
           

摘要

When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. (C) 2017 Elsevier B.V. All rights reserved.
机译:当使用推文预测人口健康指数时,由于数据量很大,按人群进行推文聚合已成为学习特征以表征人口的一种流行做法。这将减轻用于提取每个单独的推文上的特征的计算成本。另一方面,有关人口的许多信息可能会丢失,因为人口文本特征的分布对于确定该人口的健康指数可能很重要。另外,特征之间可能存在关系,并且这些关系也可以传达健康指数的预测信息。在本文中,我们提出了中级功能,即基于内核的功能,用于根据社交媒体数据预测人群的健康指数。基于总体特征推文上的文本特征分布提取基于内核的特征,并在内核函数中对各个文本特征之间的关系进行编码。我们使用三种不同的核函数来实现我们的功能,并将其应用于人口健康预测的两个案例研究:全年预测和跨县预测。对基于内核的功能进行了评估,并将其与从行为风险因素监视系统数据集收集的数据集上的现有功能进行了比较。实验结果表明,基于内核的功能比现有技术具有更高的预测性能,最高可达16.3%,这表明所提出的功能在人口级别数据分析的广泛应用中具有潜力和适用性。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号