Prediction of Binary Response Variable in Panel Data.

机译：面板数据中二元响应变量的预测。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of interest is to predict y epsilon¸ {0, 1}, based on some characteristics x. The approach of the Gibbs Posterior distribution, which is constructed from an empirical classification risk and aims to minimizing a risk function without modeling the data probabilistically, is a new direction in dealing with the problem of classification and prediction, even in presence of high dimensionality of x.;In this dissertation, we study panel data binary response model, which includes n mutually independent sequences of some dependent process, such as strong mixing or even more general than strong mixing, with time period T and the response variable is of binary choice, either 0 or 1. Firstly, we extend two inequalities: Bosq inequality (1993) and Triplex inequality (2009), to this data structure to construct an upper probability bound for the pointwise deviation and the uniform deviation between the empirical risk and its expectation.;Then the Gibbs Posterior is applied in panel data binary response model to generate some parameters b to construct some linear classifiers so that we can make classification and prediction on the response variable y. The asymptotic properties of the risk function for the proposed b in two kinds of scenarios, either under T → infinity or under n → infinity but with not large T, are discussed. The near optimal performance of the risk of the Gibbs Posterior has been achieved only when T → infinity, while a new marginalized risk has been proposed in the other situation and the relation between the two risk measure is demonstrated as well.;We also study the convergence rate of the risk minimization with variable selection in high dimension either from frequentist and Bayesian approaches. The Bayesian treatment is the Gibbs Posterior, constructed directly from an empirical classification risk, which has a robust property rather than classical Bayesian Posterior. The risk function converges to the optimal risk at near parametric rate, only dependent on the sample size, despite the high dimensionality.;A simulation study has been conducted to study the classification and prediction performance of the Gibbs Posterior distribution in the panel data binary response model with a random individual effect. Comparison with the classical Bayesian likelihood method confirms that the Gibbs Posterior performs as well as the Bayesian method when the generating process is correctly modeled. While the data is generating from the model, which is misspecified, the classification performance of the Gibbs Posterior, which doesn't depend on the model assumption, is much better. On the other hand, we find that increasing T helps to reduce the prediction error more effectively compared to increasing n. We also illustrate the method with Gibbs Posterior in a real data application on the brand choice of yogurt purchases.

机译：感兴趣的问题是基于某些特征x来预测yepsilon¸{0，1}。吉布斯后验分布的方法是从经验分类风险构建的，旨在在不对数据进行概率建模的情况下最小化风险函数，即使在存在高维数的情况下，也是处理分类和预测问题的新方向。本文研究了面板数据的二元响应模型，该模型包括n个相互独立的相关过程序列，例如强混合或比强混合更普遍的时间序列T，并且响应变量是二元选择的，即0或1。首先，我们将两个不等式扩展到此数据结构中，以建立经验风险与其期望值之间的逐点偏差和均匀偏差的上限概率，即Bosq不等式（1993）和Triplex不等式（2009）。。;然后将Gibbs后验应用于面板数据二元响应模型中以生成一些参数b来构造一些线性分类器s o我们可以对响应变量y进行分类和预测。讨论了在两种情况下，在T→无穷大或在n→无穷大但T不大的情况下，拟议b的风险函数的渐近性质。仅当T→无穷大时，吉布斯后验的风险才达到接近最佳的性能，而在其他情况下提出了新的边缘化风险，并且证明了这两种风险测度之间的关系。从高频率和贝叶斯方法中选择高维变量进行风险最小化的收敛速度。贝叶斯处理是吉布斯后验，它是根据经验分类风险直接构建的，它具有强大的属性，而不是经典的贝叶斯后验。尽管维数高，但风险函数以接近参数的速率收敛到最佳风险，仅取决于样本量。;已经进行了仿真研究，以研究面板数据二元响应中吉布斯后验分布的分类和预测性能具有随机个体效应的模型。与经典贝叶斯似然方法的比较证实，在正确建模生成过程时，吉布斯后验的表现与贝叶斯方法一样好。尽管从错误指定的模型生成数据，但不依赖于模型假设的Gibbs后验的分类性能要好得多。另一方面，我们发现与增加n相比，增加T有助于更有效地减少预测误差。我们还将在实际数据应用程序中使用Gibbs Posterior说明酸奶购买品牌选择的方法。

著录项

作者
Yao, Lili.;
展开▼
作者单位

Northwestern University.;

展开▼
授予单位 Northwestern University.;
学科 Statistics.
学位 Ph.D.
年度 2011
页码 100 p.
总页数 100
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Latent variables, measurement error and methods for analysing longitudinal binary and ordinal data. [J] . Palta M, Lin CY Statistics in medicine . 1999,第4期

机译：潜在变量，测量误差以及分析纵向二进制和序数数据的方法。
2. Ordered conformational change in the protein backbone: prediction of conformationally variable positions from sequence and low-resolution structural data. [J] . Kuznetsov IB Proteins: Structure, Function, and Genetics . 2008,第1期

机译：蛋白质骨架中有序的构象变化：根据序列和低分辨率结构数据预测构象可变位置。
3. Analysis on binary responses with ordered covariates and missing data. [J] . Taylor JM, Wang L, Li Z Statistics in medicine . 2007,第18期

机译：用有序协变量和缺失数据分析二进制响应。
4. Nonlinear Dynamic Response Prediction of a Thin Panel in a Multi-Discipline Environment: Part II - Numerical Predictions [C] . R. A. Perez, S. M. Spottswood, T. J. Beberniss, IMAC Conference and Exposition on Structural Dynamics . 2016

机译：多学科环境中薄板的非线性动态响应预测：第二部分 - 数值预测
5. Ion exchange on a chelating resin: Multicomponent equilibrium predictions using binary data. [D] . Klink, Paula Rae. 2001

机译：螯合树脂上的离子交换：使用二进制数据进行的多组分平衡预测。
6. Prediction and Variable Selection in High-Dimensional Misspecified Binary Classification [O] . Konrad Furmańczyk, Wojciech Rejchel 2020

机译：高维误报二进制分类中的预测和变量选择
7. Latent variable models for binary response data. [O] . Albanese Maria Teresinha 1990

机译：二进制响应数据的潜在变量模型。

Prediction of Binary Response Variable in Panel Data.

摘要

著录项

相似文献

相关主题

期刊订阅