...
首页> 外文期刊>ACM transactions on knowledge discovery from data >Utility-Theoretic Ranking for Semiautomated Text Classification
【24h】

Utility-Theoretic Ranking for Semiautomated Text Classification

机译:半自动文本分类的效用理论排名

获取原文
获取原文并翻译 | 示例
           

摘要

Semiautomated Text Classification (SATC) may be defined as the task of ranking a set V of automatically labelled textual documents in such a way that, if a human annotator validates (i.e., inspects and corrects where appropriate) the documents in a top-ranked portion of V with the goal of increasing the overall labelling accuracy of V, the expected increase is maximized. An obvious SATC strategy is to rank V so that the documents that the classifier has labelled with the lowest confidence are top ranked. In this work, we show that this strategy is suboptimal. We develop new utility-theoretic ranking methods based on the notion of validation gain, defined as the improvement in classification effectiveness that would derive by validating a given automatically labelled document. We also propose a new effectiveness measure for SATC-oriented ranking methods, based on the expected reduction in classification error brought about by partially validating a list generated by a given ranking method. We report the results of experiments showing that, with respect to the baseline method mentioned earlier, and according to the proposed measure, our utility-theoretic ranking methods can achieve substantially higher expected reductions in classification error.
机译:半自动文本分类(SATC)可以定义为对一组自动标记的文本文档进行排序的任务,以便在人工注释者验证(即在适当情况下进行检查和更正)排名靠前的部分中为了提高V的整体标记准确度,可以最大程度地提高期望值。一个明显的SATC策略是对V进行排名,以使分类器标记为最低置信度的文档排名最高。在这项工作中,我们表明该策略是次优的。我们基于验证收益的概念开发了新的效用理论排名方法,验证收益的定义是通过验证给定的自动标记文档可以提高分类效率。我们还针对基于SATC的排名方法提出了一种新的有效性度量,该方法基于通过部分验证由给定排名方法生成的列表而带来的分类错误的预期减少。我们报告的实验结果表明,相对于前面提到的基准线方法,根据建议的措施,我们的效用理论排名方法可以大大提高分类误差的预期减少率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号