Numero: a statistical framework to define multivariable subgroups in complex population-based datasets

Gao Song; Mutter Stefan; Casey Aaron; Makinen Ville-Petteri

首页> 外文期刊>International Journal of Epidemiology: Official Journal of the International Epidemiological Association >Numero: a statistical framework to define multivariable subgroups in complex population-based datasets

【24h】

Numero: a statistical framework to define multivariable subgroups in complex population-based datasets

机译：numero：统计框架，用于在复杂的基于人群的数据集中定义多变量子组

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large-scale epidemiological and population data provide opportunities to identify subgroups of people who are at risk of disease or exposed to adverse environments. Clustering algorithms are popular data-driven tools to identify these subgroups; however, relying exclusively on algorithms may not produce the best results if the dataset does not have a clustered structure. For this reason, we propose a framework (the R-library Numero) that combines the self-organizing map algorithm, permutation analysis for statistical evidence and a final expert-driven subgrouping step. We used Numero to define subgroups in two examples without an obvious clustering structure: a biomedical dataset of kidney disease and another dataset of community-level socioeconomic indicators. We benchmarked the Numero subgroupings against popular clustering algorithms (principal components, K-means and hierarchical clustering). The Numero subgroupings were more intuitive and easier to interpret without losing mathematical quality. Therefore, we expect Numero to be useful for exploratory analyses of population-based epidemiological datasets.

机译：大规模流行病学和人口数据提供了识别有疾病风险或暴露于不利环境的人的子组的机会。聚类算法是流行的数据驱动工具，用于识别这些子组;但是，如果数据集没有群集结构，则专门依赖于算法可能不会产生最佳结果。因此，我们提出了一个框架（R-Library Numero），它结合了自组织地图算法，统计证据的排列分析和最终的专家驱动的子组步骤。我们使用Numero来定义两个例子中的子组，没有明显的聚类结构：肾病的生物医学数据集和社区级社会经济指标的另一个数据集。我们基准测试Numero子组对流行聚类算法（主组件，k均值和分层聚类）。 Numero子组素更直观，更容易解释而不失去数学质量。因此，我们预计Numero可用于探索基于人群的流行病学数据集。

著录项

来源
《International Journal of Epidemiology: Official Journal of the International Epidemiological Association》 |2019年第2期|共6页
作者
Gao Song; Mutter Stefan; Casey Aaron; Makinen Ville-Petteri;
展开▼
作者单位

South Australian Hlth &

Med Res Inst Heart Hlth Theme Adelaide SA Australia;

South Australian Hlth &

Med Res Inst Heart Hlth Theme Adelaide SA Australia;

South Australian Hlth &

Med Res Inst Heart Hlth Theme Adelaide SA Australia;

South Australian Hlth &

Med Res Inst Heart Hlth Theme Adelaide SA Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类流行病学与防疫;
关键词
Multivariable statistics; data-driven subgrouping; self-organizing map; population data;

机译：多变量统计;数据驱动的子组;自组织地图;人口数据;

相似文献

外文文献
中文文献
专利

1. Numero: a statistical framework to define multivariable subgroups in complex population-based datasets [J] . Gao Song, Mutter Stefan, Casey Aaron, International Journal of Epidemiology: Official Journal of the International Epidemiological Association . 2019,第2期

机译：numero：统计框架，用于在复杂的基于人群的数据集中定义多变量子组
2. Multivariable risk prediction can greatly enhance the statistical power of clinical trial subgroup analysis [J] . Rodney A Hayward, David M Kent, Sandeep Vijan, BMC Medical Research Methodology . 2006,第1期

机译：多变量风险预测可以大大增强临床试验亚组分析的统计能力
3. A bootstrap method for estimating bias and variance in statistical fisheries modelling frameworks using highly disparate datasets [J] . Elvarsson B. P., Taylor L., Trenkel V. M., African Journal of Marine Science . 2014,第1期

机译：一种使用高度分散的数据集估算统计渔业建模框架中偏差和方差的引导方法
4. Nugget Browser: Visual Subgroup Mining and Statistical Significance Discovery in Multivariate Datasets [C] . Guo Zhenyu, Ward Matthew O., Rundensteiner Elke A. 15th International Conference on Information Visualisation . 2011

机译：掘金浏览器：多元数据集中的可视子组挖掘和统计意义发现
5. Statistical methods for complex datasets [D] . Xia, Lucy 2015

机译：复杂数据集的统计方法
6. Multivariable risk prediction can greatly enhance the statistical power of clinical trial subgroup analysis [O] . Rodney A Hayward, David M Kent, Sandeep Vijan, 2006

机译：多变量风险预测可以大大增强临床试验亚组分析的统计能力
7. Nugget Browser: Visual Subgroup Mining and Statistical Significance Discovery in Multivariate Datasets [O] . Zhenyu Guo, Matthew O. Ward, Elke A. Rundensteiner 2013

机译：掘金浏览器：多元数据集中的可视子组挖掘和统计意义发现

Numero: a statistical framework to define multivariable subgroups in complex population-based datasets

摘要

著录项

相似文献

相关主题

期刊订阅