...
首页> 外文期刊>Statistics and computing >Learning causal structure from mixed data with missing values using Gaussian copula models
【24h】

Learning causal structure from mixed data with missing values using Gaussian copula models

机译:使用高斯copula模型从具有缺失值的混合数据中学习因果结构

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We consider the problem of causal structure learning from data with missing values, assumed to be drawn from a Gaussian copula model. First, we extend the Rank PC' algorithm, designed for Gaussian copula models with purely continuous data (so-called nonparanormal models), to incomplete data by applying rank correlation to pairwise complete observations and replacing the sample size with an effective sample size in the conditional independence tests to account for the information loss from missing values. When the data are missing completely at random (MCAR), we provide an error bound on the accuracy of Rank PC' and show its high-dimensional consistency. However, when the data are missing at random (MAR), Rank PC' fails dramatically. Therefore, we propose a Gibbs sampling procedure to draw correlation matrix samples from mixed data that still works correctly under MAR. These samples are translated into an average correlation matrix and an effective sample size, resulting in the Copula PC' algorithm for incomplete data. Simulation study shows that: (1) Copula PC' estimates a more accurate correlation matrix and causal structure than Rank PC' under MCAR and, even more so, under MAR and (2) the usage of the effective sample size significantly improves the performance of Rank PC' and Copula PC.' We illustrate our methods on two real-world datasets: riboflavin production data and chronic fatigue syndrome data.
机译:我们考虑从具有缺失值的数据中学习因果结构的问题,假设该数据是从高斯系模型中得出的。首先,我们通过将秩相关应用于成对的完整观测值并将样本量替换为有效样本量,将针对纯连续数据的高斯copula模型(所谓的非超自然模型)设计的Rank PC'算法扩展到不完整数据。条件独立性测试以解决由于缺少值而导致的信息丢失。当数据完全随机丢失(MCAR)时,我们将为Rank PC'的准确性提供误差界限,并显示其高维一致性。但是,当随机丢失数据(MAR)时,Rank PC'会严重失败。因此,我们提出了一种吉布斯采样程序,以从在MAR下仍能正常工作的混合数据中提取相关矩阵样本。这些样本被转换为平均相关矩阵和有效样本量,从而导致Copula PC'算法获取不完整的数据。仿真研究表明:(1)在MCAR下,Copula PC'估计比Rank PC'更准确的相关矩阵和因果结构;在MAR情况下,更是如此(2)有效样本量的使用显着改善了将PC排名和Copula PC排名。我们在两个真实的数据集上说明了我们的方法:核黄素生产数据和慢性疲劳综合征数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号