Preferential text classification: learning algorithms and evaluation measures

Fabio Aiolli; Riccardo Cardin; Fabrizio Sebastiani; Alessandro Sperduti

首页> 外文期刊>Information retrieval >Preferential text classification: learning algorithms and evaluation measures

【24h】

Preferential text classification: learning algorithms and evaluation measures

机译：优先文本分类：学习算法和评估措施

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In many applicative contexts in which textual documents are labelled with thematic categories, a distinction is made between the primary categories of a document, which represent the topics that are central to it, and its secondary categories, which represent topics that the document only touches upon. We contend that this distinction, so far neglected in text categorization research, is important and deserves to be explicitly tackled. The contribution of this paper is threefold. First, we propose an evaluation measure for this preferential text categorization task, whereby different kinds of misclassifications involving either primary or secondary categories have a different impact on effectiveness. Second, we establish several baseline results for this task on a well-known benchmark for patent classification in which the distinction between primary and secondary categories is present; these results are obtained by reformulating the preferential text categorization task in terms of well established classification problems, such as single and/or multi-label multiclass classification; state-of-the-art learning technology such as SVMs and kernel-based methods are used. Third, we improve on these results by using a recently proposed class of algorithms explicitly devised for learning from training data expressed in preferential form, i.e., in the form "for document d_i, category c' is preferred to category c""; this allows us to distinguish between primary and secondary categories not only in the classification phase but also in the learning phase, thus differentiating their impact on the classifiers to be generated.

机译：在许多应用性上下文中，文本文档都标有主题类别，在文档的主要类别（代表文档的主要主题）和次要类别（仅代表文档涉及的主题）之间存在区别。我们认为，迄今为止在文本分类研究中被忽略的这一区别很重要，应该予以明确解决。本文的贡献是三方面的。首先，我们针对此优先文本分类任务提出了一种评估措施，其中涉及主要类别或次要类别的不同类型的误分类对有效性产生不同的影响。其次，我们在一个众所周知的专利分类基准上为此任务建立了几个基准结果，其中存在主要类别和次要类别之间的区别；这些结果是通过根据公认的分类问题（例如单标签和/或多标签多分类）重新定义优先文本分类任务而获得的；使用了最新的学习技术，例如SVM和基于内核的方法。第三，我们通过使用最近提议的一类算法来改进这些结果，这些算法是专门为从优先形式表示的训练数据中学习而设计的，即以“对于文档d_i，类别c'比类别c优先”的形式学习；这允许我们不仅在分类阶段而且在学习阶段区分主要和次要类别，从而区分它们对要生成的分类器的影响。

著录项

来源
《Information retrieval》 |2009年第5期|559-580|共22页
作者
Fabio Aiolli; Riccardo Cardin; Fabrizio Sebastiani; Alessandro Sperduti;
展开▼
作者单位

Dipartimento di Matematica Pura e Applicata, Universita di Padova, Via Trieste, 63-35121 Padova, Italy;

Dipartimento di Matematica Pura e Applicata, Universita di Padova, Via Trieste, 63-35121 Padova, Italy;

Istituto di Scienza e Tecnologie dell'lnformazione, Consiglio Nazionale delle Ricerche, Via Giuseppe Moruzzi, 1-56124 Pisa, Italy;

Dipartimento di Matematica Pura e Applicata, Universita di Padova, Via Trieste, 63-35121 Padova, Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
preferential learning; supervised learning; text categorization; text classification; primary and secondary categories;

机译：优先学习;监督学习;文本分类文字分类主要和次要类别;

相似文献

外文文献
中文文献
专利

1. Teleconsultations between Patients and Healthcare Professionals in Primary Care in Catalonia: The Evaluation of Text Classification Algorithms Using Supervised Machine Learning [J] . Current Organic Synthesis . 2020,第3期

机译：加泰罗尼亚初级保健患者和医疗保健专业人员之间的电信元素：使用监督机器学习评估文本分类算法
2. Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation [J] . M. A. Wajeed, T. Adilakshmi International Journal of Intelligent Systems Technologies and Applications . 2012,第3a4期

机译：使用增强型KNN算法的文本分类中的有监督和半监督学习：文本分类中有监督和半监督分类的比较研究
3. Text and non-text image classification algorithm of computer design scene based on deep learning [J] . Lai Shouliang, Luo Zihui, Wang Meiyan Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：基于深度学习的计算机设计场景文本与非文本图像分类算法
4. The Use of Entropy Measure for Higher Quality Machine Learning Algorithms in Text Data Processing [C] . Anna I. Guseva, Igor A. Kuznetsov IEEE International Conference on Future Internet of Things and Cloud Workshops . 2017

机译：熵测度在文本数据处理中用于更高质量的机器学习算法
5. A Study of Applying Machine Learning Algorithms in Application of Text Classification [D] . Lalluvadia, Megha. 2017

机译：机器学习算法在文本分类中的应用研究
6. Teleconsultations between Patients and Healthcare Professionals in Primary Care in Catalonia: The Evaluation of Text Classification Algorithms Using Supervised Machine Learning [O] . Francesc López Seguí, Ricardo Ander Egg Aguilar, Gabriel de Maeztu, 2020

机译：加泰罗尼亚基层医疗机构的患者与医疗专业人员之间的远程咨询：使用监督机器学习的文本分类算法的评估
7. Preferential text classification: learning algorithms and evaluation measures [O] . Alessandro Sperduti, F. Aiolli, R. Cardin, 2013

机译：优先文本分类：学习算法和评估措施

Preferential text classification: learning algorithms and evaluation measures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅