首页> 外文学位 >Model checking for incomplete high-dimensional categorical data (Incomplete data).

【24h】

Model checking for incomplete high-dimensional categorical data (Incomplete data).

机译：对不完整的高维分类数据（不完整的数据）进行模型检查。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Categorical data are often arranged in a contingency table and summarized by a loglinear model. A standard approach for comparing two competing models is to calculate twice the discrepancy between maximized loglikelihoods, which follows a χ2 distribution asymptotically. But when data are sparse, the χ2 approximation may be questionable.; As an alternative to a large-sample approximation to the reference distribution, we implement the framework introduced by Rubin (1984) for finding the posterior predictive check (PPC) distribution. The PPC distribution represents the conditional probability of a future value of a test statistic based on the information given by observed data along with model specifications, which can serve as the reference distribution for the relevant likelihood-ratio statistics.; However, it can be computationally demanding to construct a PPC distribution based on a large number of replicates. This is especially the case when the original data are incomplete, since generation of each PPC replicate requires an involved statistical computing approach (we use a data-augmentation strategy). In practice, we propose to approximate the PPC distribution by a gamma distribution whose parameters are estimated by a combination of training data and a modest-sized sample of PPC replicates. Some simulated examples suggest that this procedure, which can reduce the computation needed to approximate the PPC distribution by a factor of 20, has satisfactory statistical properties.

机译：分类数据通常排列在列联表中，并通过对数线性模型进行汇总。比较两个竞争模型的一种标准方法是，计算最大对数似然之间的差异，该差异是渐近遵循χ 2 分布的。但是，当数据稀疏时，χ 2 近似值可能会令人怀疑。作为参考分布的大样本近似值的替代方法，我们采用了Rubin（1984）引入的框架来查找后验预测检查（PPC）分布。 PPC分布表示基于观察数据给出的信息以及模型规范的检验统计量的未来值的条件概率，可以用作相关似然比统计的参考分布。但是，在计算上可能需要基于大量重复来构建PPC分发。当原始数据不完整时尤其如此，因为每个PPC复制的生成都需要一种涉及的统计计算方法（我们使用数据增强策略）。实际上，我们建议通过伽玛分布来近似PPC分布，该伽玛分布的参数是通过训练数据和中等大小的PPC复制样本的组合来估算的。一些模拟示例表明，该程序可以将PPC分布的近似值减少20倍，具有令人满意的统计特性。

著录项

作者
Hu, Ming-Yi.;
展开▼
作者单位

University of California, Los Angeles.;

展开▼
授予单位 University of California, Los Angeles.;
学科 Statistics.; Mathematics.; Education Mathematics.
学位 Ph.D.
年度 1999
页码 87 p.
总页数 87
原文格式 PDF
正文语种 eng
中图分类统计学;数学;
关键词
入库时间 2022-08-17 11:47:56

相似文献

外文文献
中文文献
专利

1. Bayesian method for learning graphical models with incompletely categorical data [J] . Zhi Geng, Yang-Bo He, Xue-Li Wang, Computational statistics & data analysis . 2003,第1a2期

机译：用不完全分类数据学习图形模型的贝叶斯方法
2. A simple and fast alternative to the EM algorithm for incomplete categorical data and latent class models [J] . Andrzej T. Galecki, Thomas R. Ten Have, Geert Molenberghs Computational statistics & data analysis . 2001,第3期

机译：不完整分类数据和潜在类模型的简单快速替代EM算法的方法
3. A note on posterior predictive checks to assess model fit for incomplete data [J] . Xu Dandan, Chatterjee Arkendu, Daniels Michael Statistics in medicine . 2016,第27期

机译：关于后预测检查以评估模型是否适合不完整数据的注释
4. On Fuzzy Clustering for Incomplete Spherical Data and for Incomplete Multivariate Categorical Data [C] . Yuchi Kanzawa International Symposium on Advanced Intelligent Systems;International Conference on Soft Computing and Intelligent Systems . 2018

机译：不完全球形数据和不完全多元分类数据的模糊聚类研究
5. Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model. [D] . Lu, Xiang. 2016

机译：使用纵向因素分析模型通过多重插补处理具有混合数据类型的不完整的高维多元纵向数据。
6. A Note on Posterior Predictive Checks to Assess Model Fit for Incomplete Data [O] . Dandan Xu, Arkendu Chatterjee, Michael J. Daniels -1

机译：关于对模型进行后验预测评估以评估模型是否适合不完整数据的注释
7. CLINCH: Clustering Incomplete High-Dimensional Data for Data Mining Application ⋆ [O] . Zunping Cheng, Ding Zhou, Chen Wang, 2012

机译：CLINCH：为数据挖掘应用程序聚类不完整的高维数据⋆

Model checking for incomplete high-dimensional categorical data (Incomplete data).

摘要

著录项

相似文献

相关主题

期刊订阅