首页> 外文会议>International ACM SIGIR conference on research development in information retrieval >Cluster-Based One-Class Ensemble for Classification Problems in Information Retrieval
【24h】

Cluster-Based One-Class Ensemble for Classification Problems in Information Retrieval

机译:基于聚类的一类集成在信息检索中的分类问题

获取原文

摘要

A number of relevant information retrieval classification problems are one-class classification problems at heart. I.e., labeled data is only available for one class, the so-called target class, and common discrimination-based classification approaches, be them binary or multiclass, are not applicable. Achieving a, high effectiveness when solving one-class problems is difficult anyway and it becomes even more challenging when the target class data is multimodal, which is often the case. To address these concerns we propose a cluster-based one-class ensemble that consists of four steps: (1) applying a clustering algorithm to the target class data, (2) training an individual one-class classifier for each of the identified dusters, (3) aggregating the decisions of the individual classifiers, and (4) selecting the best fitting clustering model. We evaluate our approach with four datasets: an artificially generated dataset, a dataset compiled from a known multiclass text corpus, and two datasets related to one-class problems that; received much attention recently, namely authorship verification and quality flaw prediction. Our approach outperforms a one-class SVM on all four datasets.
机译:许多相关的信息检索分类问题本质上是一类分类问题。即,标记的数据仅适用于一个类别,即所谓的目标类别,并且不适用于基于歧视的常见分类方法(二进制或多类)。无论如何,在解决一类问题时要获得高效率是很困难的,而在目标类数据是多模式的情况下(通常是这种情况),则变得更加具有挑战性。为了解决这些问题,我们提出了一个基于聚类的一类集成,该集成包含四个步骤:(1)将聚类算法应用于目标类数据,(2)为每个已识别的除尘器训练一个单独的一类分类器, (3)汇总各个分类器的决策,以及(4)选择最佳拟合的聚类模型。我们用四个数据集评估我们的方法:一个人工生成的数据集,一个从已知的多类文本语料库编译的数据集,以及两个与一类问题相关的数据集;最近受到了很多关注,即作者身份验证和质量缺陷预测。我们的方法在所有四个数据集上均优于一类SVM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号