首页> 外文学位 >Model-based clustering for multivariate time series of counts.
【24h】

Model-based clustering for multivariate time series of counts.

机译:基于模型的聚类,用于多元时间序列计数。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation develops a modeling framework for univariate and multivariate zero-inflated time series of counts and applies the models in a clustering scheme to identify groups of count series with similar behavior. The basic modeling framework used is observation-driven Poisson regression with generalized linear model (GLM) structure. The zero-inflated Poisson (ZIP) model is employed to characterize the possibility of extra observed zeros relative to the Poisson, a common feature of count data. These two methods are combined to characterize time series of counts where the counts and the probability of extra zeros may depend on past data observations and on exogenous covariates.;A key contribution of this work is a novel modeling paradigm for multivariate zero-inflated counts. The three related models considered are the jointly-inflated, the marginally-inflated, and the doubly-inflated multivariate Poisson. The doubly-inflated model encompasses both marginal-inflation, which allows for additional zeros at each time epoch for each individual count series, and joint-inflation, which allows for zero-inflation across all multivariate series. These models improve upon previously proposed models, which are either too rigid or too simplistic to be applicable in a wide variety of applications. To estimate the model parameters, a new Monte Carlo Estimation Maximization (MCEM) algorithm is developed. The Monte Carlo sampling eliminates complex recursion formulas needed for calculating the probability function of the multivariate Poisson. The algorithm is easily adapted for different multivariate zero-inflation schemes.;The new models, new estimation methods, and applications in clustering are demonstrated on simulated and real datasets. For an application in finance, the number of trades and the number of price changes for bonds are modeled as a bivariate doubly zero-inflated Poisson time series, where observations of zero trades or zero price changes represent the liquidity risk for that bond. In an environmental science application, the new models are used in a model-based clustering scheme to study counts of high pollution events at air quality monitoring stations around Houston, Texas. Clustering reveals regions of the air monitoring network which behave similarly in terms of time dependence and response to covariates representing atmospheric conditions and physical sources of air pollution.
机译:本文建立了单变量和多变量零膨胀时间序列的建模框架,并将其应用于聚类方案中,以识别行为相似的计数序列组。使用的基本建模框架是具有广义线性模型(GLM)结构的观察驱动泊松回归。零膨胀泊松(ZIP)模型用于表征相对于泊松(计数数据的常见特征)额外观测到的零的可能性。结合这两种方法来表征计数的时间序列,其中计数和额外零的概率可能取决于过去的数据观察结果以及外生协变量。;这项工作的主要贡献是多元零膨胀计数的新颖建模范例。所考虑的三个相关模型是共同膨胀,边际膨胀和双重膨胀的多元泊松模型。双重充气模型既包含边际通货膨胀和联合通货膨胀,边际通货膨胀在每个时间周期为每个单独的计数系列提供额外的零,联合通货膨胀允许在所有多变量系列中实现零通货膨胀。这些模型对先前提出的模型进行了改进,这些模型过于僵化或过于简单,无法应用于多种应用。为了估计模型参数,开发了一种新的蒙特卡洛估计最大化(MCEM)算法。蒙特卡洛采样消除了计算多元泊松概率函数所需的复杂递归公式。该算法很容易适应不同的多元零通胀方案。;在模拟和真实数据集上演示了新的模型,新的估计方法以及在聚类中的应用。对于金融应用,债券的交易数量和价格变动数量被建模为双变量双零膨胀的Poisson时间序列,其中零交易或零价格变动的观察值表示该债券的流动性风险。在环境科学应用中,新模型用于基于模型的聚类方案中,以研究德克萨斯州休斯顿附近的空气质量监测站的高污染事件计数。聚类揭示了空气监测网络的区域在时间依赖性和对代表大气条件和空气污染物理源的协变量的响应方面表现相似。

著录项

  • 作者

    Thomas, Sarah Julia.;

  • 作者单位

    Rice University.;

  • 授予单位 Rice University.;
  • 学科 Statistics.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 211 p.
  • 总页数 211
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号