首页> 外文期刊>International Journal of Computational Intelligence and Applications >A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization
【24h】

A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization

机译:一种结合了搜索和分类技术的可扩展元分类器,用于多级文本分类

获取原文
获取原文并翻译 | 示例
       

摘要

Nowadays, documents are increasingly associated with multi-level category hierarchies rather than a Flat category scheme. As the volume and diversity of documents grow, so do the size and complexity of the corresponding category hierarchies. To be able to access such hierarchically classified documents in real-time, we need fast automatic methods to navigate these hierarchies. Today's data domains are also very different from each other, such as medicine and politics. These distinct domains can be handled by different classifiers. A document representation system which incorporates the inherent category structure of the data should also add useful semantic content to the data vectors and thus lead to better separability of classes. In this paper, we present a scalable meta-classifier to tackle today's problem of multi-level data classification in the presence of large datasets. To speed up the classification process, we use a search-based method to detect the level-1 category of a test document. For this purpose, we use a category-hierarchy-based vector representation. We evaluate the meta-classifier by scaling to both longer documents as well as to a larger category set and show it to be robust in both cases. We test the architecture of our meta-classifier using six different base classifiers (Random forest, C4.5, multilayer perceptron, naive Bayes, BayesNet (BN) and PART). We observe that even though there is a very small variation in the performance of different architectures, all of them perform much better than the corresponding single baseline classifiers. We conclude that there is substantial potential in this meta-classifier architecture, rather than the classifiers themselves, which successfully improves classification performance.
机译:如今,文档越来越多地与多层类别层次结构相关联,而不是与统一类别方案相关联。随着文档的数量和多样性的增加,相应类别层次结构的大小和复杂性也随之增加。为了能够实时访问此类分层分类的文档,我们需要快速的自动方法来导航这些层次结构。当今的数据域也彼此非常不同,例如医学和政治。这些不同的域可以由不同的分类器处理。包含数据固有类别结构的文档表示系统也应将有用的语义内容添加到数据向量中,从而导致更好的类可分离性。在本文中,我们提出了一种可扩展的元分类器,以解决在存在大型数据集的情况下当今的多级数据分类问题。为了加快分类过程,我们使用基于搜索的方法来检测测试文档的1级类别。为此,我们使用基于类别层次的矢量表示。我们通过缩放到更长的文档和更大的类别集来评估元分类器,并显示在两种情况下它都很健壮。我们使用六个不同的基本分类器(随机森林,C4.5,多层感知器,朴素贝叶斯,BayesNet(BN)和PART)测试了元分类器的体系结构。我们观察到,即使不同架构的性能差异很小,但它们的性能都比相应的单个基线分类器好得多。我们得出的结论是,这种元分类器体系结构(而不​​是分类器本身)具有很大的潜力,可以成功提高分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号