首页> 外文会议>International conference on computational collective intelligence >Text Classification Using Novel 'Anti-Bayesian' Techniques
【24h】

Text Classification Using Novel 'Anti-Bayesian' Techniques

机译:使用新型“反贝叶斯”技术进行文本分类

获取原文

摘要

This paper presents a non-traditional "Anti-Bayesian" solution for the traditional Text Classification (TC) problem. Historically, all the recorded TC schemes work using the fundamental paradigm that once the statistical features are inferred from the syntactic/semantic indicators, the classifiers themselves are the well-established statistical ones. In this paper, we shall demonstrate that by virtue of the skewed distributions of the features, one could advantageously work with information latent in certain "non-central" quantiles (i.e., those distant from the mean) of the distributions. We, indeed, demonstrate that such classifiers exist and are attainable, and show that the design and implementation of such schemes work with the recently-introduced paradigm of Quantile Statistics (QS)-based classifiers. These classifiers, referred to as Classification by Moments of Quantile Statistics (CMQS), are essentially "Anti"-Bayesian in their modus operandi. To achieve our goal, in this paper we demonstrate the power and potential of CMQS to describe the very high-dimensional TC-related vector spaces in terms of a limited number of "outlier-based" statistics. Thereafter, the PR task in classification invokes the CMQS classifier for the underlying multi-class problem by using a linear number of pair-wise CMQS-based classifiers. By a rigorous testing on the standard 20-Newsgroups corpus we show that CMQS-based TC attains accuracy that is comparable to the best-reported classifiers. We also propose the potential of fusing the results of a CMQS-based method with those obtained from a traditional scheme.
机译:本文针对传统的文本分类(TC)问题提出了一种非传统的“反贝叶斯”解决方案。从历史上看,所有记录的TC方案都是使用基本范式工作的,一旦从句法/语义指标中推断出统计特征,分类器本身就是公认的统计方法。在本文中,我们将证明,借助特征的偏斜分布,人们可以有利地利用某些分布在某些“非中心”分位数(即远离均值的分位数)中的潜在信息。我们确实证明了此类分类器的存在和可实现性,并表明此类方案的设计和实现与最近引入的基于分位数统计(QS)的分类器范式一起工作。这些分类器称为按分位数统计矩进行分类(CMQS),在其工作方式上本质上是“反”-贝叶斯方法。为了实现我们的目标,在本文中,我们演示了CMQS在有限数量的“基于异常值”的统计数据中描述与高维TC相关的向量空间的能力和潜力。此后,分类中的PR任务通过使用线性数量的基于成对CMQS的分类器,为基础的多类问题调用CMQS分类器。通过对标准20-新闻组语料库的严格测试,我们表明基于CMQS的TC可获得与最佳报道分类器相当的准确性。我们还提出了将基于CMQS的方法的结果与从传统方案中获得的结果相融合的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号