首页> 外文期刊>International Journal of Epidemiology: Official Journal of the International Epidemiological Association >Numero: a statistical framework to define multivariable subgroups in complex population-based datasets
【24h】

Numero: a statistical framework to define multivariable subgroups in complex population-based datasets

机译:numero:统计框架,用于在复杂的基于人群的数据集中定义多变量子组

获取原文
获取原文并翻译 | 示例
           

摘要

Large-scale epidemiological and population data provide opportunities to identify subgroups of people who are at risk of disease or exposed to adverse environments. Clustering algorithms are popular data-driven tools to identify these subgroups; however, relying exclusively on algorithms may not produce the best results if the dataset does not have a clustered structure. For this reason, we propose a framework (the R-library Numero) that combines the self-organizing map algorithm, permutation analysis for statistical evidence and a final expert-driven subgrouping step. We used Numero to define subgroups in two examples without an obvious clustering structure: a biomedical dataset of kidney disease and another dataset of community-level socioeconomic indicators. We benchmarked the Numero subgroupings against popular clustering algorithms (principal components, K-means and hierarchical clustering). The Numero subgroupings were more intuitive and easier to interpret without losing mathematical quality. Therefore, we expect Numero to be useful for exploratory analyses of population-based epidemiological datasets.
机译:大规模流行病学和人口数据提供了识别有疾病风险或暴露于不利环境的人的子组的机会。聚类算法是流行的数据驱动工具,用于识别这些子组;但是,如果数据集没有群集结构,则专门依赖于算法可能不会产生最佳结果。因此,我们提出了一个框架(R-Library Numero),它结合了自组织地图算法,统计证据的排列分析和最终的专家驱动的子组步骤。我们使用Numero来定义两个例子中的子组,没有明显的聚类结构:肾病的生物医学数据集和社区级社会经济指标的另一个数据集。我们基准测试Numero子组对流行聚类算法(主组件,k均值和分层聚类)。 Numero子组素更直观,更容易解释而不失去数学质量。因此,我们预计Numero可用于探索基于人群的流行病学数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号