Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

Marston L; Peacock JL; Yu K; Brocklehurst P; Calvert SA; Greenough A; Marlow N

首页> 外文期刊>Paediatric and perinatal epidemiology >Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

【24h】

Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

机译：比较具有小聚类的数据集的分析方法：使用四个儿科数据集的案例研究。

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

机译：对早产婴儿的研究包含较大比例的多胞胎，因此所得数据具有层次结构，具有大小为1、2或3的小聚类。忽略聚类可能会导致错误的推断。本研究的目的是比较可用于分析此类数据的统计方法：广义估计方程，多层模型，多元线性回归和逻辑回归。分析了四个数据集，这些数据的总大小和多胎出生百分比不同（n = 254，多胎18％; n = 176，多胎9％; n = 10098，多胎3％; n = 1585，多胎8％）。有了连续的结果，两级模型在更大的数据集中产生了相似的结果，而广义最小二乘多级建模（Stata中为ML GLS'xtreg'）和最大似然多级建模（Stata中为ML MLE'xtmixed'）产生了使用较小的数据集。对于二分式结果，除广义最小二乘多级建模（Stata中的ML GH'xtlogit'）外，大多数方法在数据集中均具有相似的优势比和95％的置信区间。对于连续结果，我们的结果建议使用多级建模。我们得出结论，当数据集较小时，应谨慎使用广义最小二乘多层模型（Stata中为ML GLS'xtreg'）和最大似然多层模型（Stata中为ML MLE'xtmixed'）。如果结果是二分法，并且非独立数据的比例相对较高，建议在使用逻辑回归和调整后的标准误差或多级建模的分析中考虑这些数据。但是，如果数据集的聚类比例大于大小1的比例很小（例如，倍数很少的儿童的人口数据集），则似乎不需要调整聚类。

著录项

来源
《Paediatric and perinatal epidemiology》 |2009年第4期|共13页
作者
Marston L; Peacock JL; Yu K; Brocklehurst P; Calvert SA; Greenough A; Marlow N;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类流行病学与防疫;
关键词

相似文献

外文文献
中文文献
专利

1. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets. [J] . Marston L, Peacock JL, Yu K, Paediatric and perinatal epidemiology . 2009,第4期

机译：比较具有小聚类的数据集的分析方法：使用四个儿科数据集的案例研究。
2. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants [J] . Odile Sauzet, Janet L. Peacock BMC Medical Research Methodology . 2017,第1期

机译：具有一些大小为2的聚类的数据集中的二项式结果：是否可以解释双胞胎的依赖性？基于早产儿数据集比较统计方法可靠性的模拟研究
3. A User Study to Compare Four Uncertainty Visualization Methods for 1D and 2D Datasets [J] . Sanyal Jibonananda, Zhang Song, Bhattacharya Gargi, Visualization and Computer Graphics, IEEE Transactions on . 2009,第6期

机译：一项用户研究，比较一维和二维数据集的四种不确定性可视化方法
4. Comparing two density-based clustering methods for reducing very large spatio-temporal dataset [C] . Whelan Michael, Le-Khac Nhien-An, Kechadi M-Tahar 2011 IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services . 2011

机译：比较两种基于密度的聚类方法以减少非常大的时空数据集
5. Multi-Domain Clustering of Real-Valued Datasets. [D] . Hu, Zhen. 2011

机译：实值数据集的多域聚类。
6. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants [O] . Odile Sauzet, Janet L. Peacock 2017

机译：具有大小为2的某些簇的数据集中的二项式结果：是否可以解释双胞胎的依赖性？基于早产儿数据集比较统计方法可靠性的模拟研究
7. MCAM: multiple clustering analysis methodology for deriving hypotheses and insights from high-throughput proteomic datasets. [O] . Kristen M Naegle, Roy E Welsch, Michael B Yaffe, 2011

机译：mCam：多聚类分析方法，用于从高通量蛋白质组数据集中推导出假设和见解。

Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

摘要

著录项

相似文献

相关主题

期刊订阅