Text Classification Using Novel 'Anti-Bayesian' Techniques

机译：使用新型“反贝叶斯”技术进行文本分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a non-traditional "Anti-Bayesian" solution for the traditional Text Classification (TC) problem. Historically, all the recorded TC schemes work using the fundamental paradigm that once the statistical features are inferred from the syntactic/semantic indicators, the classifiers themselves are the well-established statistical ones. In this paper, we shall demonstrate that by virtue of the skewed distributions of the features, one could advantageously work with information latent in certain "non-central" quantiles (i.e., those distant from the mean) of the distributions. We, indeed, demonstrate that such classifiers exist and are attainable, and show that the design and implementation of such schemes work with the recently-introduced paradigm of Quantile Statistics (QS)-based classifiers. These classifiers, referred to as Classification by Moments of Quantile Statistics (CMQS), are essentially "Anti"-Bayesian in their modus operandi. To achieve our goal, in this paper we demonstrate the power and potential of CMQS to describe the very high-dimensional TC-related vector spaces in terms of a limited number of "outlier-based" statistics. Thereafter, the PR task in classification invokes the CMQS classifier for the underlying multi-class problem by using a linear number of pair-wise CMQS-based classifiers. By a rigorous testing on the standard 20-Newsgroups corpus we show that CMQS-based TC attains accuracy that is comparable to the best-reported classifiers. We also propose the potential of fusing the results of a CMQS-based method with those obtained from a traditional scheme.

机译：本文针对传统的文本分类（TC）问题提出了一种非传统的“反贝叶斯”解决方案。从历史上看，所有记录的TC方案都是使用基本范式工作的，一旦从句法/语义指标中推断出统计特征，分类器本身就是公认的统计方法。在本文中，我们将证明，借助特征的偏斜分布，人们可以有利地利用某些分布在某些“非中心”分位数（即远离均值的分位数）中的潜在信息。我们确实证明了此类分类器的存在和可实现性，并表明此类方案的设计和实现与最近引入的基于分位数统计（QS）的分类器范式一起工作。这些分类器称为按分位数统计矩进行分类（CMQS），在其工作方式上本质上是“反”-贝叶斯方法。为了实现我们的目标，在本文中，我们演示了CMQS在有限数量的“基于异常值”的统计数据中描述与高维TC相关的向量空间的能力和潜力。此后，分类中的PR任务通过使用线性数量的基于成对CMQS的分类器，为基础的多类问题调用CMQS分类器。通过对标准20-新闻组语料库的严格测试，我们表明基于CMQS的TC可获得与最佳报道分类器相当的准确性。我们还提出了将基于CMQS的方法的结果与从传统方案中获得的结果相融合的潜力。

著录项

来源
《International conference on computational collective intelligence》|2015年|1-15|共15页
会议地点
作者
B. John Oommen; Richard Khoury; Aron Schmidt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Text classification; Quantile statistics (QS); Classification by the moments of QS (CMQS);

机译：文字分类;分位数统计（QS）;按QS分类（CMQS）;

相似文献

外文文献
中文文献
专利

1. On the classification of dynamical data streams using novel "Anti-Bayesian" techniques [J] . Hammer Hugo Lewi, Yazidi Anis, Oommen B. John Pattern Recognition: The Journal of the Pattern Recognition Society . 2018,第期

机译：关于使用小说“抗贝叶斯”技术进行动态数据流的分类
2. A Comparison of Text-Classification Techniques Applied to Arabic Text [J] . Ghassan Kanaan, Riyad Al-Shalabi, Sameh Ghwanmeh, Journal of the American Society for Information Science and Technology . 2009,第9期

机译：应用于阿拉伯文本的文本分类技术的比较
3. Performance Comparison and Optimization of Text Document Classification using k-NN and Na?ve Bayes Classification Techniques [J] . Zulfany Erlisa Rasjid, Reina Setiawan Procedia Computer Science . 2017,第22期

机译：基于k-NN和朴素贝叶斯分类技术的文本文档分类性能比较和优化
4. Text Classification Using Novel 'Anti-Bayesian' Techniques [C] . B. John Oommen International conference on computational collective intelligence . 2015

机译：使用新型“反贝叶斯”技术进行文本分类
5. Kernel methods and semantic techniques for clinical text classification [D] . Garla, Vijay. 2012

机译：临床文本分类的内核方法和语义技术
6. Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts [O] . Rong Xu, Kaustubh Supekar, Yang Huang, 2006

机译：结合文本分类和隐马尔可夫建模技术构建随机临床试验摘要
7. Text Classification Using Novel “Anti-Bayesian” Techniques [O] . Oommen, John, Khoury, Richard, Schmidt, Aron 2015

机译：使用新的“反贝叶斯”技术进行文本分类

Text Classification Using Novel 'Anti-Bayesian' Techniques

摘要

著录项

相似文献

相关主题

期刊订阅