首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Extension of the Rocchio Classification Method to Multi-modal Categorization of Documents in Social Media
【24h】

Extension of the Rocchio Classification Method to Multi-modal Categorization of Documents in Social Media

机译:将Rocchio分类方法扩展到社交媒体中文档的多模式分类

获取原文

摘要

Most of the approaches in multi-view categorization use early fusion, late fusion or co-training strategies. We propose here a novel classification method that is able to efficiently capture the interactions across the different modes. This method is a multi-modal extension of the Rocchio classification algorithm - very popular in the Information Retrieval community. The extension consists of simultaneously maintaining different "centroid" representations for each class, in particular "cross-media" centroids that correspond to pairs of modes. To classify new data points, different scores are derived from similarity measures between the new data point and these different centroids; a global classification score is finally obtained by suitably aggregating the individual scores. This method outperforms the multi-view logistic regression approach (using either the early fusion or the late fusion strategies) on a social media corpus - namely the ENRON email collection - on two very different categorization tasks (folder classification and recipient prediction).
机译:多视图分类中的大多数方法都使用早期融合,晚期融合或协同训练策略。我们在这里提出一种新颖的分类方法,该方法能够有效地捕获不同模式之间的交互。此方法是Rocchio分类算法的多模式扩展-在信息检索社区中非常流行。扩展包括为每个类别同时维护不同的“质心”表示形式,尤其是与模式对相对应的“跨媒体”质心。为了对新数据点进行分类,需要根据新数据点与这些不同质心之间的相似性度量得出不同的分数。最终,通过适当地汇总各个分数来获得全局分类分数。在两种非常不同的分类任务(文件夹分类和收件人预测)上,此方法优于社交媒体语料库(即ENRON电子邮件收集)上的多视图逻辑回归方法(使用早期融合或晚期融合策略)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号