A Fully Bayesian Sparse Probit Model for Text Categorization

Behrouz Madahian; Usef Faghihi

首页> 中文期刊> 《统计学期刊（英文）》 >A Fully Bayesian Sparse Probit Model for Text Categorization

A Fully Bayesian Sparse Probit Model for Text Categorization

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of covariates far exceeds the number of samples, the parameter estimation becomes very difficult. Researchers in many fields such as text categorization deal with the burden of finding and estimating important covariates without overfitting the model. In this study, we developed a Sparse Probit Bayesian Model (SPBM) based on Gibbs sampling which utilizes double exponentials prior to induce shrinkage and reduce the number of covariates in the model. The method was evaluated using ten domains such as mathematics, the corpuses of which were downloaded from Wikipedia. From the downloaded corpuses, we created the TFIDF matrix corresponding to all domains and divided the whole data set randomly into training and testing groups of size 300. To make the model more robust we performed 50 re-samplings on selection of training and test groups. The model was implemented in R and the Gibbs sampler ran for 60 k iterations and the first 20 k was discarded as burn in. We performed classification on training and test groups by calculating P (yi = 1) and according to [1] [2] the threshold of 0.5 was used as decision rule. Our model’s performance was compared to Support Vector Machines (SVM) using average sensitivity and specificity across 50 runs. The SPBM achieved high classification accuracy and outperformed SVM in almost all domains analyzed.

著录项

来源
《统计学期刊（英文）》 |2014年第8期|611-619|共9页
作者
Behrouz Madahian; Usef Faghihi;
展开▼
作者单位

Department of Mathematical Sciences;

University of Memphis;

Memphis;

TN;

USA;

Department of Computing and Technology;

Cameron University;

Lawton;

OK;

USA;

展开▼
原文格式 PDF
正文语种 chi
中图分类肿瘤学;
关键词
Bayesian; LASSO; Shrinkage; Parameter; Estimation; Generalized; Linear; Models; Machine; Learning;

相似文献

中文文献
外文文献
专利

A Fully Bayesian Sparse Probit Model for Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅