首页> 外文学位 >Spatial probit models for multivariate ordinal data: Computational efficiency and parameter identifiability.
【24h】

Spatial probit models for multivariate ordinal data: Computational efficiency and parameter identifiability.

机译:多元序数数据的空间概率模型:计算效率和参数可识别性。

获取原文
获取原文并翻译 | 示例

摘要

The Colorado Natural Heritage Program (CNHP) at Colorado State University evaluates Colorado's rare and at-risk species and habitats and promotes conservation of biological resources. One of the goals of the program is to determine the condition of wetlands across the state of Colorado. The data collected are measurements, or metrics, representing landscape condition, biotic condition, hydrologic condition, and physiochemical condition in river basins statewide. The metrics differ in variable type, including binary, ordinal, count, and continuous response data. It is common practice to uniformly discretize the metrics into ordinal values and combine them using a weighted-average to obtain a univariate measure of wetland condition. The weights assigned to each metric are based on best professional judgment.;The motivation of this work was to improve on the user-defined weights by developing a statistical model to estimate the weights using observed data. The challenges of creating a model that fulfills this requirement are many. First, the observed data are multivariate and consist of different variable types which we wish to preserve. Second, the multivariate response data are not independent across river basin because wetlands at close proximity are correlated. Third, we want the model to provide a univariate measure of wetland condition that can be compared across the state. Lastly, it is of interest to the ecologists to predict the univariate measure of wetland condition at unobserved locations requiring covariate information to be incorporated into the model.;We propose a multivariate multilevel latent variable model to address these challenges. Latent continuous response variables are used to model the different types of response variables. An additional latent variable, or common factor, is used as a univariate measure of wetland condition. The mean of the common factor contains observable covariate data in order to predict at unobserved locations. The variance of the common factor is defined by a spatial covariance function to account for the dependence between wetlands.;The majority of the metrics reported by the CNHP are ordinal. Therefore, our primary focus is modeling multivariate ordinal response data where binary data is a special case. Probit linear models and probit linear mixed models are examples of models for ordinal response data. Probit models are attractive in that they can be defined in terms of latent variables.;Computational efficiency is a major issue when fitting multivariate latent variable models in a Bayesian framework using Markov chain Monte Carlo (MCMC). There is also a high computation cost for running MCMC when fitting geostatistical spatial models. Data augmentation and parameter expansion are both modeling techniques that can lead to optimal iterative sampling algorithms for MCMC. Data augmentation allows for simpler and more feasible simulation from a posterior distribution.;Parameter expansion is a method for accelerating convergence of iterative sample algorithms and can enhance data augmentation algorithms. We propose data augmentation and parameter-expanded data augmentation algorithms for fitting MCMC to spatial probit models for binary and ordinal response data. Parameter identifiability is another challenge when fitting multivariate latent variable models due to the multivariate response data, number of parameters, unobserved latent variables, and spatial random effects. We investigate parameter identifiability for the common factor model for multivariate ordinal response data. We extend the common factor model to include covariates and spatial correlation so we can predict wetland condition at unobserved locations. The partial sill and range parameter of a spatial covariance function are difficult to estimate because they are near-nonidentifiable. We propose a new parameterization for the covariance function of the spatial probit model that leads to better mixing and faster convergence of the MCMC.;Whereas our spatial probit model for ordinal response data follows the common factor model approach, there are other forms of the spatial probit model. We give a comprehensive comparison of two types of spatial probit models, which we refer to as the first-stage and second-stage spatial probit model. We discuss the implications of fitting each model and compare them in terms of their impact on parameter estimation and prediction at unobserved locations. We propose a new approximation for predicting ordinal response data that is both accurate and efficient.;We apply the multivariate multilevel latent variable model to data collected in the North Platte and Rio Grande River Basins to evaluate wetland condition. We obtain statistically derived weights for each of the response metrics with confidence limits. Lastly, we predict the univariate measure of wetland condition at unobserved locations.
机译:科罗拉多州立大学的科罗拉多州自然遗产计划(CNHP)评估了科罗拉多州稀有和处于危险之中的物种和栖息地,并促进了生物资源的保护。该计划的目标之一是确定整个科罗拉多州的湿地状况。收集的数据是度量或度量,代表了全州河流域的景观状况,生物状况,水文状况和理化状况。度量标准在变量类型上有所不同,包括二进制,序数,计数和连续响应数据。通常的做法是将指标统一离散为序数值,然后使用加权平均值将其组合以获得湿地条件的单变量度量。分配给每个度量标准的权重是基于最佳专业判断。这项工作的目的是通过开发统计模型来使用观察到的数据估算权重,从而改善用户定义的权重。创建满足此要求的模型面临许多挑战。首先,观察到的数据是多变量的,由我们希望保留的不同变量类型组成。其次,多元响应数据在流域之间并不是独立的,因为紧邻的湿地是相关的。第三,我们希望该模型能够提供可对全州进行比较的单变量湿地条件。最后,对于生态学家来说,预测在未观察到的位置需要将协变量信息纳入模型的湿地条件的单变量测量值是有意义的。我们提出了一个多变量多级潜变量模型来应对这些挑战。潜在连续响应变量用于对不同类型的响应变量进行建模。附加的潜在变量或公因子用作湿地条件的单变量度量。公共因子的平均值包含可观察到的协变量数据,以便在未观察到的位置进行预测。公共因子的方差由空间协方差函数定义,以解释湿地之间的依赖性。CNHP报告的大多数指标是有序的。因此,我们的主要重点是对多元序数响应数据进行建模,其中二进制数据是特例。 Probit线性模型和Probit线性混合模型是顺序响应数据模型的示例。概率模型很有吸引力,因为它们可以根据潜变量进行定义。当使用马尔可夫链蒙特卡洛(MCMC)在贝叶斯框架中拟合多元潜变量模型时,计算效率是一个主要问题。拟合地统计空间模型时,运行MCMC的计算成本也很高。数据扩充和参数扩展都是可为MCMC带来最佳迭代采样算法的建模技术。数据扩充允许从后验分布进行更简单,更可行的仿真。参数扩展是一种加速迭代样本算法收敛的方法,可以增强数据扩充算法。我们提出数据扩充和参数扩展数据扩充算法,以将MCMC拟合为二进制和序数响应数据的空间概率模型。由于多元响应数据,参数数量,未观察到的潜在变量和空间随机效应,在拟合多元潜在变量模型时,参数可识别性是另一个挑战。我们调查多变量序数响应数据的公共因子模型的参数可识别性。我们将公共因子模型扩展为包括协变量和空间相关性,以便我们可以预测未观测位置的湿地状况。空间协方差函数的部分基点和范围参数很难估计​​,因为它们几乎不可识别。对于空间概率模型的协方差函数,我们提出了一种新的参数化方法,以使MCMC更好地混合并更快地收敛。虽然序数响应数据的空间概率模型遵循公因子模型方法,但空间还有其他形式概率模型。我们对两种类型的空间概率模型进行了全面的比较,我们将其称为第一阶段和第二阶段空间概率模型。我们讨论了拟合每个模型的含义,并根据它们对未观察到的位置的参数估计和预测的影响进行比较。我们提出了一种新的近似方法来预测序数响应数据,该方法既准确又有效。我们将多元多级潜变量模型应用于北普拉特河和里奥格兰德河流域收集的数据,以评估湿地条件。我们获得具有置信度限制的每个响应度量的统计得出的权重。最后,我们预测了未观察到的湿地条件的单变量测量。

著录项

  • 作者

    Schliep, Erin M.;

  • 作者单位

    Colorado State University.;

  • 授予单位 Colorado State University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 209 p.
  • 总页数 209
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号