首页> 外文期刊>Statistics in medicine >Marginalized multilevel hurdle and zero-inflated models for overdispersed and correlated count data with excess zeros
【24h】

Marginalized multilevel hurdle and zero-inflated models for overdispersed and correlated count data with excess zeros

机译:边缘化的多级障碍和零膨胀模型,用于过度分散和相关的计数数据(具有多余的零)

获取原文
获取原文并翻译 | 示例
           

摘要

Count data are collected repeatedly over time in many applications, such as biology, epidemiology, and public health. Such data are often characterized by the following three features. First, correlation due to the repeated measures is usually accounted for using subject-specific random effects, which are assumed to be normally distributed. Second, the sample variance may exceed the mean, and hence, the theoretical mean-variance relationship is violated, leading to overdispersion. This is usually allowed for based on a hierarchical approach, combining a Poisson model with gamma distributed random effects. Third, an excess of zeros beyond what standard count distributions can predict is often handled by either the hurdle or the zero-inflated model. A zero-inflated model assumes two processes as sources of zeros and combines a count distribution with a discrete point mass as a mixture, while the hurdle model separately handles zero observations and positive counts, where then a truncated-at-zero count distribution is used for the non-zero state. In practice, however, all these three features can appear simultaneously. Hence, a modeling framework that incorporates all three is necessary, and this presents challenges for the data analysis. Such models, when conditionally specified, will naturally have a subject-specific interpretation. However, adopting their purposefully modified marginalized versions leads to a direct marginal or population-averaged interpretation for parameter estimates of covariate effects, which is the primary interest in many applications. In this paper, we present a marginalized hurdle model and a marginalized zero-inflated model for correlated and overdispersed count data with excess zero observations and then illustrate these further with two case studies. The first dataset focuses on the Anopheles mosquito density around a hydroelectric dam, while adolescents' involvement in work, to earn money and support their families or themselves, is studied in the second example. Sub-models, which result from omitting zero-inflation and/or overdispersion features, are also considered for comparison's purpose. Analysis of the two datasets showed that accounting for the correlation, overdispersion, and excess zeros simultaneously resulted in a better fit to the data and, more importantly, that omission of any of them leads to incorrect marginal inference and erroneous conclusions about covariate effects.
机译:在生物学,流行病学和公共卫生等许​​多应用中,计数数据会随着时间的推移而重复收集。此类数据通常具有以下三个特征。首先,通常使用特定于受试者的随机效应来解释由于重复测量而产生的相关性,这些效应被假定为正态分布。其次,样本方差可能超过平均值,因此违反了理论均值-方差关系,从而导致过度分散。通常基于分层方法,将Poisson模型与伽马分布随机效应相结合,可以允许这样做。第三,超过标准计数分布所无法预测的零通常是由障碍模型或零膨胀模型处理的。零膨胀模型假定两个过程为零源,并将计数分布与离散点质量混合在一起,而跨栏模型分别处理零观测值和正计数,然后使用零位截断计数分布对于非零状态。但是实际上,所有这三个功能可以同时出现。因此,必须建立一个包含所有三个方面的建模框架,这对数据分析提出了挑战。当有条件地指定此类模型时,自然会有特定主题的解释。但是,采用其有目的地修改的边缘化版本会导致对协变量效应的参数估计进行直接的边缘化或总体平均化解释,这是许多应用程序的主要兴趣所在。在本文中,我们针对具有过多零观测值的相关和过度分散计数数据,提出了一个边际化障碍模型和一个边际化零膨胀模型,然后通过两个案例研究进一步说明了这些问题。第一个数据集关注水电大坝周围的按蚊蚊子密度,而第二个示例研究青少年参与工作,赚钱和养家糊口。出于比较目的,也考虑了由于省略零通货膨胀和/或过度分散特征而产生的子模型。对这两个数据集的分析表明,同时考虑相关性,过度分散和多余零点会更好地拟合数据,更重要的是,忽略它们中的任何一个都会导致错误的边际推断和关于协变量效应的错误结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号