首页> 外文期刊>JMIR public health and surveillance. >Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity
【24h】

Building a National Neighborhood Dataset From Geotagged Twitter Data for Indicators of Happiness, Diet, and Physical Activity

机译:从地理标记的Twitter数据构建全国邻里数据集,以获取幸福感,饮食和身体活动的指标

获取原文
           

摘要

Background: Studies suggest that where people live, play, and work can influence health and well-being. However, the dearth of neighborhood data, especially data that is timely and consistent across geographies, hinders understanding of the effects of neighborhoods on health. Social media data represents a possible new data resource for neighborhood research. Objective: The aim of this study was to build, from geotagged Twitter data, a national neighborhood database with area-level indicators of well-being and health behaviors. Methods: We utilized Twitter’s streaming application programming interface to continuously collect a random 1% subset of publicly available geolocated tweets for 1 year (April 2015 to March 2016). We collected 80 million geotagged tweets from 603,363 unique Twitter users across the contiguous United States. We validated our machine learning algorithms for constructing indicators of happiness, food, and physical activity by comparing predicted values to those generated by human labelers. Geotagged tweets were spatially mapped to the 2010 census tract and zip code areas they fall within, which enabled further assessment of the associations between Twitter-derived neighborhood variables and neighborhood demographic, economic, business, and health characteristics. Results: Machine labeled and manually labeled tweets had a high level of accuracy: 78% for happiness, 83% for food, and 85% for physical activity for dichotomized labels with the F scores 0.54, 0.86, and 0.90, respectively. About 20% of tweets were classified as happy. Relatively few terms (less than 25) were necessary to characterize the majority of tweets on food and physical activity. Data from over 70,000 census tracts from the United States suggest that census tract factors like percentage African American and economic disadvantage were associated with lower census tract happiness. Urbanicity was related to higher frequency of fast food tweets. Greater numbers of fast food restaurants predicted higher frequency of fast food mentions. Surprisingly, fitness centers and nature parks were only modestly associated with higher frequency of physical activity tweets. Greater state-level happiness, positivity toward physical activity, and positivity toward healthy foods, assessed via tweets, were associated with lower all-cause mortality and prevalence of chronic conditions such as obesity and diabetes and lower physical inactivity and smoking, controlling for state median income, median age, and percentage white non-Hispanic. Conclusions: Machine learning algorithms can be built with relatively high accuracy to characterize sentiment, food, and physical activity mentions on social media. Such data can be utilized to construct neighborhood indicators consistently and cost effectively. Access to neighborhood data, in turn, can be leveraged to better understand neighborhood effects and address social determinants of health. We found that neighborhoods with social and economic disadvantage, high urbanicity, and more fast food restaurants may exhibit lower happiness and fewer healthy behaviors.
机译:背景:研究表明,人们的生活,娱乐和工作会影响健康和福祉。但是,邻里数据的缺乏,尤其是跨地区的及时且一致的数据,阻碍了人们对邻里对健康的影响的了解。社交媒体数据代表了邻里研究的一种可能的新数据资源。目的:本研究的目的是从带有地理标签的Twitter数据中构建一个全国邻里数据库,该数据库具有区域级别的幸福感和健康行为指标。方法:我们利用Twitter的流式应用程序编程界面,连续收集了1年(2015年4月至2016年3月)的随机1%的公开可用地理位置定位推文的子集。我们从美国各地的603,363位唯一的Twitter用户那里收集了8000万条带有地理标签的推文。通过将预测值与人类标签生成的值进行比较,我们验证了用于构建幸福感,食物和体育锻炼指标的机器学习算法。带有地理标记的推文在空间上映射到它们所在的2010年人口普查区和邮政编码区域,从而可以进一步评估Twitter衍生的邻域变量与邻里人口统计学,经济,商业和健康特征之间的关联。结果:机器标记和手动标记的推文具有较高的准确性:二分标签的幸福度为78%,食品为83%,体育活动为85%,F值分别为0.54、0.86和0.90。大约20%的推文被归类为“开心”。相对于大多数有关食物和体育锻炼的推文而言,几乎不需要几个词(少于25个)。来自美国的70,000多个人口普查区的数据表明,诸如非裔美国人百分比和经济劣势等人口普查区因素与较低的人口普查区幸福感相关。都市性与快餐推文的频率较高有关。越来越多的快餐店预测出现快餐的频率会更高。令人惊讶的是,健身中心和自然公园仅与较高的体育锻炼推文频率相关。通过推文评估,州一级的幸福感,对体育活动的积极性和对健康食品的积极性与较低的全因死亡率和肥胖症,糖尿病等慢性病的患病率以及较低的体育活动和吸烟率(控制状态中位数)相关收入,中位年龄和非西班牙裔白人比例。结论:可以相对较高的精度构建机器学习算法,以表征社交媒体上的情感,食物和体育活动。这样的数据可以被用来一致地和成本有效地构造邻域指标。反过来,可以利用对邻域数据的访问来更好地了解邻域影响并解决健康的社会决定因素。我们发现,具有社会和经济劣势,城市化程度较高且快餐店数量较多的社区可能显示出较低的幸福感和较少的健康行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号