A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization

Nandita Tripathi; Michael Oakes; Stefan Wermter

首页> 外文期刊>International Journal of Computational Intelligence and Applications >A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization

【24h】

A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization

机译：一种结合了搜索和分类技术的可扩展元分类器，用于多级文本分类

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

AI期刊论文写作 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Nowadays, documents are increasingly associated with multi-level category hierarchies rather than a Flat category scheme. As the volume and diversity of documents grow, so do the size and complexity of the corresponding category hierarchies. To be able to access such hierarchically classified documents in real-time, we need fast automatic methods to navigate these hierarchies. Today's data domains are also very different from each other, such as medicine and politics. These distinct domains can be handled by different classifiers. A document representation system which incorporates the inherent category structure of the data should also add useful semantic content to the data vectors and thus lead to better separability of classes. In this paper, we present a scalable meta-classifier to tackle today's problem of multi-level data classification in the presence of large datasets. To speed up the classification process, we use a search-based method to detect the level-1 category of a test document. For this purpose, we use a category-hierarchy-based vector representation. We evaluate the meta-classifier by scaling to both longer documents as well as to a larger category set and show it to be robust in both cases. We test the architecture of our meta-classifier using six different base classifiers (Random forest, C4.5, multilayer perceptron, naive Bayes, BayesNet (BN) and PART). We observe that even though there is a very small variation in the performance of different architectures, all of them perform much better than the corresponding single baseline classifiers. We conclude that there is substantial potential in this meta-classifier architecture, rather than the classifiers themselves, which successfully improves classification performance.

机译：如今，文档越来越多地与多层类别层次结构相关联，而不是与统一类别方案相关联。随着文档的数量和多样性的增加，相应类别层次结构的大小和复杂性也随之增加。为了能够实时访问此类分层分类的文档，我们需要快速的自动方法来导航这些层次结构。当今的数据域也彼此非常不同，例如医学和政治。这些不同的域可以由不同的分类器处理。包含数据固有类别结构的文档表示系统也应将有用的语义内容添加到数据向量中，从而导致更好的类可分离性。在本文中，我们提出了一种可扩展的元分类器，以解决在存在大型数据集的情况下当今的多级数据分类问题。为了加快分类过程，我们使用基于搜索的方法来检测测试文档的1级类别。为此，我们使用基于类别层次的矢量表示。我们通过缩放到更长的文档和更大的类别集来评估元分类器，并显示在两种情况下它都很健壮。我们使用六个不同的基本分类器（随机森林，C4.5，多层感知器，朴素贝叶斯，BayesNet（BN）和PART）测试了元分类器的体系结构。我们观察到，即使不同架构的性能差异很小，但它们的性能都比相应的单个基线分类器好得多。我们得出的结论是，这种元分类器体系结构（而不是分类器本身）具有很大的潜力，可以成功提高分类性能。

著录项

来源
《International Journal of Computational Intelligence and Applications》 |2015年第4期|共16页
作者
Nandita Tripathi; Michael Oakes; Stefan Wermter;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
Large scale datasets; Meta-classifiers; Multi-level classification; Text categorization; Parallel classifiers; Semantic representation of data;

机译：大型数据集;元分类器;多级分类;文本分类;并行分类器;数据的语义表示;

相似文献

外文文献
中文文献
专利

1. A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization [J] . Nandita Tripathi, Michael Oakes, Stefan Wermter International Journal of Computational Intelligence and Applications . 2015,第4期

机译：一种结合了搜索和分类技术的可扩展元分类器，用于多级文本分类
2. Feature Selection for Efficient Text Categorization and Knowledge Discovery Using Classification Techniques [J] . A. Christy, P. Thambidurai Asian Journal of Information Technology . 2006,第8期

机译：使用分类技术进行高效文本分类和知识发现的特征选择
3. Ensemble Text Classifier: A Document Classification Technique to Predict and Categorizes Regularised and Novel Classes Using Incremental Learning [J] . G. Silambarasan, J. Anvar Shathik International Journal of Applied Engineering Research . 2017,第22aPta5期

机译：合奏文本分类器：使用增量学习预测和分类正规化和新型类的文档分类技术
4. The relationship of text categorization using Dewey Decimal Classification techniques [C] . Watthananon Julaluk International Conference on ICT and Knowledge Engineering . 2014

机译：使用杜威十进制分类技术进行文本分类的关系
5. Phonetic categorization and classification using acoustic-phonetic and artificial intelligence techniques. [D] . Grinberg, Eugene. 2002

机译：使用语音和人工智能技术进行语音分类和分类。
6. Combining Text Classification and Hidden Markov Modeling Techniques for Structuring Randomized Clinical Trial Abstracts [O] . Rong Xu, Kaustubh Supekar, Yang Huang, 2006

机译：结合文本分类和隐马尔可夫建模技术构建随机临床试验摘要
7. Personalized news categorization through scalable text classification [O] . Ioannis Antonellis, Christos Bouras, Vassilis Poulopoulos 2006

机译：通过可扩展的文本分类进行个性化新闻分类

A Scalable Meta-Classifier Combining Search and Classification Techniques for Multi-Level Text Categorization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅