A Fully Bayesian Sparse Probit Model for Text Categorization

Behrouz Madahian; Usef Faghihi

首页> 外文期刊>Open Journal of Statistics >A Fully Bayesian Sparse Probit Model for Text Categorization

【24h】

A Fully Bayesian Sparse Probit Model for Text Categorization

机译：用于文本分类的完全贝叶斯稀疏概率模型

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of covariates far exceeds the number of samples, the parameter estimation becomes very difficult. Researchers in many fields such as text categorization deal with the burden of finding and estimating important covariates without overfitting the model. In this study, we developed a Sparse Probit Bayesian Model (SPBM) based on Gibbs sampling which utilizes double exponentials prior to induce shrinkage and reduce the number of covariates in the model. The method was evaluated using ten domains such as mathematics, the corpuses of which were downloaded from Wikipedia. From the downloaded corpuses, we created the TFIDF matrix corresponding to all domains and divided the whole data set randomly into training and testing groups of size 300. To make the model more robust we performed 50 re-samplings on selection of training and test groups. The model was implemented in R and the Gibbs sampler ran for 60 k iterations and the first 20 k was discarded as burn in. We performed classification on training and test groups by calculating P (yi = 1) and according to [1] [2] the threshold of 0.5 was used as decision rule. Our model’s performance was compared to Support Vector Machines (SVM) using average sensitivity and specificity across 50 runs. The SPBM achieved high classification accuracy and outperformed SVM in almost all domains analyzed.

机译：如今，当处理与小样本量（胖数据集）相比具有大量协变量的数据集时，一个常见的问题是估计与每个协变量关联的参数。当协变量的数量远远超过样本数量时，参数估计变得非常困难。文本分类等许多领域的研究人员都在不过度拟合模型的情况下处理了查找和估计重要协变量的负担。在这项研究中，我们基于Gibbs采样开发了稀疏Probit贝叶斯模型（SPBM），该模型利用双指数来诱发收缩并减少模型中的协变量数量。使用数学等十个领域对方法进行了评估，这些领域的语料库是从Wikipedia下载的。从下载的语料库中，我们创建了与所有域相对应的TFIDF矩阵，并将整个数据集随机分为大小为300的训练和测试组。为了使模型更强大，我们在选择训练和测试组时进行了50次重新采样。该模型在R中实现，Gibbs采样器运行了60 k次迭代，而前20 k被作为老化测试丢弃。我们通过计算P（yi = 1）并根据[1] [2]对训练和测试组进行分类]将阈值0.5用作决策规则。我们将模型的性能与支持向量机（SVM）进行了比较，使用了50次运行的平均灵敏度和特异性。在几乎所有分析领域中，SPBM均实现了较高的分类精度，并且优于SVM。

著录项

来源
《Open Journal of Statistics》 |2014年第8期|共9页
作者
Behrouz Madahian; Usef Faghihi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类统计学;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient bayesian inference for multivariate probit models with sparse inverse Correlation Matrices [J] . Talhouk A., Doucet A., Murphy K. Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2012,第3期

机译：具有稀疏逆相关矩阵的多元概率模型的有效贝叶斯推理
2. Efficient Bayesian Inference for Multivariate Probit Models With Sparse Inverse Correlation Matrices [J] . Aline Talhouka Arnaud Doucetb, Kevin Murphya Journal of Computational and Graphical Statistics . 2012,第3期

机译：具有稀疏逆相关矩阵的多元概率模型的有效贝叶斯推断
3. Sparse Bayesian variable selection in kernel probit model for analyzing high-dimensional data [J] . Computational statistics . 2020,第1期

机译：内核探测模型中稀疏贝叶斯变量选择，用于分析高维数据
4. A Nonparametric Bayesian Learning Model: Application to Text and Image Categorization [C] . Nizar Bouguila, Djemel Ziou Advances in knowledge discovery and data mining . 2009

机译：非参数贝叶斯学习模型：在文本和图像分类中的应用
5. Bayesian Probit Regression Models for Spatially-Dependent Categorical Data. [D] . Berrett, Candace. 2010

机译：空间相关分类数据的贝叶斯Probit回归模型。
6. Objective Bayesian Inference in Probit Models with Intrinsic Priors Using Variational Approximations [O] . Ang Li, Luis Pericchi, Kun Wang 2020

机译：目标贝叶斯推理在具有变分近似的内在前方的探测模型
7. A Fully Bayesian Sparse Probit Model for Text Categorization [O] . Behrouz Madahian, Usef Faghihi 2014

机译：文本分类的完全贝叶斯稀疏概率模型

A Fully Bayesian Sparse Probit Model for Text Categorization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅