首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Learning a Fast Bipartite Ranker for Text Documents using Lexicographical Rankers and ROC Curves
【24h】

Learning a Fast Bipartite Ranker for Text Documents using Lexicographical Rankers and ROC Curves

机译:使用lexicographic排名司机和ROC曲线学习Fast Biparte Ranker的文本文件

获取原文

摘要

The design of powerful learning methods for addressing huge amounts of unstructured data, such as text documents, is a fundamental problem within the document analysis and recognition community. In this work, we propose FlexRank, a specially designed bipartite ranking algorithm for text documents using lexicographical ordering. FlexRank is based on the area under the ROC curve (ROC AUC), which is a well-known metric to evaluate ranking and classification algorithms and to select features in text classification. In our proposal, we express the calculation of the exact increment of ROC AUC caused by each attribute inserted into a lexicographic model. Based on this calculation, FlexRank performs an internal feature selection using the area under the ROC curve to define its lexicographic ranker, which can speed up rankers by sorting instances in linear time complexity using most significant digit (MSD) radix sort. We empirically evaluated FlexRank against a range of text datasets and compared its speed and ROC AUC with that of the Support Vector Machines, Decision Trees, Naive Bayes, K-nearest neighbours, and LexRank. FlexRank was shown to be much faster than all the other methods, while retaining competitive ROC AUC performance.
机译:用于解决大量非结构化数据(如文本文件)的强大学习方法的设计是文档分析和识别社区内的基本问题。在这项工作中,我们向FlexRank提出了FlexRank,这是一种专门设计的二分位排名算法,用于使用lexicographic排序的文本文档。 FlexRank基于ROC曲线(ROC AUC)下的区域,该区域是一种众所周知的度量,用于评估排名和分类算法并选择文本分类中的特征。在我们的建议中,我们表达了由插入词典模型中的每个属性引起的Roc Auc的确切增量的计算。基于该计算,FlexRank使用ROC曲线下的区域执行内部特征选择,以定义其词典量级排列,其可以通过使用大多数有效数字(MSD)基数排序来通过排序线性时间复杂度的情况来加速列。我们经验验证了一系列文本数据集的FlexRank,并将其速度和Roc Auc与支持向量机,决策树,天真贝叶斯,K-Colless邻居和Lexrank进行了比较了其速度和ROC AUC。 FlexRank被证明比所有其他方法更快,同时保留竞争力的ROC AUC性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号