首页> 外文期刊>BMC Public Health >Optimizing data collection for public health decisions: a data mining approach
【24h】

Optimizing data collection for public health decisions: a data mining approach

机译:为公共卫生决策优化数据收集:数据挖掘方法

获取原文
           

摘要

Background Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data. Methods The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on 885 retail food outlets in two counties in West Virginia between May and November of 2011. A reduced dataset was identified for each outlet type using feature selection. Coefficients from linear regression modeling were used to weight items in the reduced datasets. Weighted item values were summed with the error term to compute reduced item survey scores. Scores produced by the full survey were compared to the reduced item scores using a Wilcoxon rank-sum test. Results Feature selection identified 9 store and 16 restaurant survey items as significant predictors of the score produced from the full survey. The linear regression models built from the reduced feature sets had R2 values of 92% and 94% for restaurant and grocery store data, respectively. Conclusions While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.
机译:背景技术收集数据既麻烦又昂贵。缺乏相关的,准确的和及时的数据来研究政策信息可能会对公共卫生产生负面影响。这项研究的目的是测试在一项名为特征选择的数据挖掘技术的指导下,仔细地从两项社区营养调查中删除项目是否可以(a)识别减少的数据集,而(b)不会破坏该数据内部的信号。方法2011年5月至11月间,在西弗吉尼亚州两个县的885家零售食品商店完成了商店(NEMS-S)和餐厅(NEMS-R)的营养环境措施调查。功能选择。线性回归模型的系数用于加权精简数据集中的项目。加权项目值与误差项相加,以计算减少的项目调查分数。使用Wilcoxon秩和检验将完整调查产生的分数与减少的项目分数进行比较。结果功能选择确定了9家商店和16家餐厅调查项目,它们是整个调查产生的得分的重要预测指标。由简化特征集构建的线性回归模型的餐厅和杂货店数据的R 2 值分别为92%和94%。结论尽管在任何域中都有许多潜在重要变量,但最有用的集合可能只是一小部分。在数据收集的初始阶段使用特征选择来识别最具影响力的变量可能是一个有用的工具,可以大大减少所需的数据量,从而降低成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号