首页> 外文期刊>Scandinavian journal of public health >Suitability of random forest analysis for epidemiological research: Exploring sociodemographic and lifestyle-related risk factors of overweight in a cross-sectional design
【24h】

Suitability of random forest analysis for epidemiological research: Exploring sociodemographic and lifestyle-related risk factors of overweight in a cross-sectional design

机译:流行病学研究随机森林分析的适用性:探讨横截面设计超重超重的社会造影和生活方式相关的风险因素

获取原文
获取原文并翻译 | 示例
           

摘要

Aims: Factors that contribute to the development of overweight are numerous and form a complex structure with many unknown interactions and associations. We aimed to explore this structure (i.e. the mutual importance or hierarchy of sociodemographic and lifestyle-related risk factors of being overweight) using a machine-learning technique called random forest (RF). The results were compared with traditional logistic regression (LR) analysis. Methods: The cross-sectional FINRISK 2007 Study included 4757 Finns (aged 25-74 years). Information on participants' lifestyle and sociodemographic characteristics were collected with questionnaires. Diet was assessed, using a validated food-frequency questionnaire. Height and weight were measured. Participants with a body mass index (BMI) 25 kg/m(2) were classified as overweight. R-statistical software was used to run RF analysis (randomForest') to derive estimates for variable importance and out-of-bag error, which were compared to a LR model. Results: In total, 704 (32%) men and 1119 (44%) women had normal BMI, whereas 1502 (69%) men and 1432 (57%) women had BMI 25. Estimated error rates for the models were similar (RF vs. LR: 42% vs. 40% for men, 38% vs. 35% for women). Both models ranked age, education and physical activity as the most important risk factors for being overweight, but RF ranked macronutrients (carbohydrates and protein) as more important compared to LR. Conclusions: RF did not demonstrate higher power in variable selection compared to LR in our study. The features of RF are more likely to appear beneficial in settings with a larger number of predictors.
机译:目的:有助于发展超重的因素是众多,形成复杂的结构,具有许多未知的相互作用和关联。我们的目标是探索这种结构(即,使用称为随机森林(RF)的机器学习技术,使用机器学习技术来探索这种结构(即超重)的互相重要性或相互相关的风险因素的层次。结果与传统的逻辑回归(LR)分析进行了比较。方法:横截面FinRISK 2007研究包括4757芬兰(25-74岁)。有关参与者的生活方式和社会渗目特征的信息是用问卷收集的。使用经过验证的食物频率问卷评估饮食。测量身高和体重。体重指数(BMI)25公斤/米(2)的参与者被归类为超重。 R型统计软件用于运行RF分析(随机速率')以导出可变重要性和禁止袋错误的估计,这与LR模型进行比较。结果:总共704名(32%)男性和1119名(44%)女性具有正常的BMI,而1502(69%)男性和1432名(57%)女性BMI 25.模型的估计错误率类似(RF与LR:男性为42%,女性38%与35%)。型号为年龄,教育和身体活动,作为超重的最重要的危险因素,但与LR相比,RF排名Macronrients(碳水化合物和蛋白质)和更重要的。结论:与我们研究中的LR相比,RF在可变选择中没有表现出更高的功率。 RF的特征更有可能在具有更大数量的预测器的环境中看起来有益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号