Utility-Theoretic Ranking for Semiautomated Text Classification

GIACOMO BERARDI; ANDREA ESULI; FABRIZIO SEBASTIANI

首页> 外文期刊>ACM transactions on knowledge discovery from data >Utility-Theoretic Ranking for Semiautomated Text Classification

【24h】

Utility-Theoretic Ranking for Semiautomated Text Classification

机译：半自动文本分类的效用理论排名

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Semiautomated Text Classification (SATC) may be defined as the task of ranking a set V of automatically labelled textual documents in such a way that, if a human annotator validates (i.e., inspects and corrects where appropriate) the documents in a top-ranked portion of V with the goal of increasing the overall labelling accuracy of V, the expected increase is maximized. An obvious SATC strategy is to rank V so that the documents that the classifier has labelled with the lowest confidence are top ranked. In this work, we show that this strategy is suboptimal. We develop new utility-theoretic ranking methods based on the notion of validation gain, defined as the improvement in classification effectiveness that would derive by validating a given automatically labelled document. We also propose a new effectiveness measure for SATC-oriented ranking methods, based on the expected reduction in classification error brought about by partially validating a list generated by a given ranking method. We report the results of experiments showing that, with respect to the baseline method mentioned earlier, and according to the proposed measure, our utility-theoretic ranking methods can achieve substantially higher expected reductions in classification error.

机译：半自动文本分类（SATC）可以定义为对一组自动标记的文本文档进行排序的任务，以便在人工注释者验证（即在适当情况下进行检查和更正）排名靠前的部分中为了提高V的整体标记准确度，可以最大程度地提高期望值。一个明显的SATC策略是对V进行排名，以使分类器标记为最低置信度的文档排名最高。在这项工作中，我们表明该策略是次优的。我们基于验证收益的概念开发了新的效用理论排名方法，验证收益的定义是通过验证给定的自动标记文档可以提高分类效率。我们还针对基于SATC的排名方法提出了一种新的有效性度量，该方法基于通过部分验证由给定排名方法生成的列表而带来的分类错误的预期减少。我们报告的实验结果表明，相对于前面提到的基准线方法，根据建议的措施，我们的效用理论排名方法可以大大提高分类误差的预期减少率。

著录项

来源
《ACM transactions on knowledge discovery from data》 |2016年第1期|6.1-6.32|共32页
作者
GIACOMO BERARDI; ANDREA ESULI; FABRIZIO SEBASTIANI;
展开▼
作者单位

Italian National Council of Research;

Qatar Computing Research Institute;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Integrating Noun-Based Feature Ranking and Selection Methods with Arabic Text Associative Classification Approach [J] . Abdullah S. Ghareb, Abdul Razak Hamdan, Azuraliza Abu Bakar Arabian Journal for Science and Engineering . 2014,第11期

机译：将基于名词的特征排名和选择方法与阿拉伯文本关联分类方法相结合
2. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text [J] . Martin Krallinger, Miguel Vazquez, Florian Leitner, BMC Bioinformatics . 2011,第SUPPLEMENTa8期

机译：BioCreative III的蛋白质-蛋白质相互作用任务：文章的分类/排名以及将生物本体学概念链接到全文
3. Tournament Structure Ranking Techniques for Bayesian Text Classification with Highly Similar Categories [J] . L.H. Lee, D. Isa, W.O. Choo, Journal of Applied Sciences . 2010,第13期

机译：具有高度相似类别的贝叶斯文本分类的比赛结构排名技术
4. A Utility-Theoretic Ranking Method for Semi-Automated Text Classification [C] . Giacomo Berardi, Andrea Esuli, Fabrizio Sebastiani International ACM SIGIR conference on research development in information retrieval . 2012

机译：半自动文本分类的效用理论排序方法
5. Adaptive Approximation Algorithms for Ranking, Routing and Classification [D] . ?Navidi, Fatemeh 2020

机译：排名，路由和分类的自适应近似算法
6. The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text [O] . Martin Krallinger, Miguel Vazquez, Florian Leitner, 2011

机译：BioCreative III的蛋白质-蛋白质相互作用任务：文章的分类/排名以及将生物本体学概念链接到全文
7. Utility-Theoretic Ranking for Semi-Automated Text Classification [O] . Berardi, Giacomo, Esuli, Andrea, Sebastiani, Fabrizio 2015

机译：半自动文本分类的效用理论排序

Utility-Theoretic Ranking for Semiautomated Text Classification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅