首页> 外文期刊>Journal of machine learning research >One-Class SVMs for Document Classification
【24h】

One-Class SVMs for Document Classification

机译:一类支持文档分类的SVM

获取原文
           

摘要

We implemented versions of the SVM appropriate for one-class classificationin the context of information retrieval. The experiments were conducted onthe standard Reuters data set. For the SVM implementation we used both a version of Schoelkopf et al.and a somewhat different version of one-classSVM based on identifying "outlier" data as representative of the second-class.We report on experiments with different kernels for both of these implementations and with different representations of the data, includingbinary vectors, tf-idf representation and a modification called "Hadamard"representation.Then we compared it with one-class versions of the algorithmsprototype (Rocchio), nearest neighbor, naive Bayes,and finally a natural one-class neural network classification method based on "bottleneck" compression generated filters.The SVM approach as represented by Schoelkopf was superior to all the methods except the neural network one, where it was, althoughoccasionally worse, essentially comparable. However, the SVM methodsturned out to be quite sensitive to the choice of representation andkernel in ways which are not well understood; therefore, for the time beingleaving the neural network approach as the most robust.
机译:我们在信息检索的范围内实现了适用于一类分类的SVM版本。实验是在标准的 Reuters 数据集上进行的。对于SVM实现,我们使用Schoelkopf等人的版本和一类SVM的版本有所不同,这是通过识别“异常”数据作为第二类的代表。我们针对这两种实现报告了使用不同内核的实验并使用不同的数据表示形式,包括二进制矢量,tf-idf表示形式和称为“ Hadamard”表示形式的修改形式。然后,将其与算法原型(Rocchio),最近邻居,朴素贝叶斯和自然算法的一类版本进行比较以“瓶颈”压缩为基础的一类神经网络分类方法生成了过滤器。以Schoelkopf表示的SVM方法优于所有方法,除了其中一种神经网络(尽管有时更差,但在本质上可比)。然而,事实证明,支持向量机方法对表示和内核的选择非常敏感,但尚不为人所知。因此,暂时将神经网络方法视为最可靠的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号