Learning a Fast Bipartite Ranker for Text Documents using Lexicographical Rankers and ROC Curves

机译：使用lexicographic排名司机和ROC曲线学习Fast Biparte Ranker的文本文件

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The design of powerful learning methods for addressing huge amounts of unstructured data, such as text documents, is a fundamental problem within the document analysis and recognition community. In this work, we propose FlexRank, a specially designed bipartite ranking algorithm for text documents using lexicographical ordering. FlexRank is based on the area under the ROC curve (ROC AUC), which is a well-known metric to evaluate ranking and classification algorithms and to select features in text classification. In our proposal, we express the calculation of the exact increment of ROC AUC caused by each attribute inserted into a lexicographic model. Based on this calculation, FlexRank performs an internal feature selection using the area under the ROC curve to define its lexicographic ranker, which can speed up rankers by sorting instances in linear time complexity using most significant digit (MSD) radix sort. We empirically evaluated FlexRank against a range of text datasets and compared its speed and ROC AUC with that of the Support Vector Machines, Decision Trees, Naive Bayes, K-nearest neighbours, and LexRank. FlexRank was shown to be much faster than all the other methods, while retaining competitive ROC AUC performance.

机译：用于解决大量非结构化数据（如文本文件）的强大学习方法的设计是文档分析和识别社区内的基本问题。在这项工作中，我们向FlexRank提出了FlexRank，这是一种专门设计的二分位排名算法，用于使用lexicographic排序的文本文档。 FlexRank基于ROC曲线（ROC AUC）下的区域，该区域是一种众所周知的度量，用于评估排名和分类算法并选择文本分类中的特征。在我们的建议中，我们表达了由插入词典模型中的每个属性引起的Roc Auc的确切增量的计算。基于该计算，FlexRank使用ROC曲线下的区域执行内部特征选择，以定义其词典量级排列，其可以通过使用大多数有效数字（MSD）基数排序来通过排序线性时间复杂度的情况来加速列。我们经验验证了一系列文本数据集的FlexRank，并将其速度和Roc Auc与支持向量机，决策树，天真贝叶斯，K-Colless邻居和Lexrank进行了比较了其速度和ROC AUC。 FlexRank被证明比所有其他方法更快，同时保留竞争力的ROC AUC性能。

著录项

来源
《IAPR International Conference on Document Analysis and Recognition》|2017年|733-1472p|共6页
会议地点
作者
Lucas de Souza Rodrigues; Edson Takashi Matsubara; Bruno Magalhaes Nogueira;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词
Bipartide ranker; ROC Curves; Text Documents; Lexicographical Rankers;

机译：二角形的Ranker;Roc曲线;文本文件;词典排名;

相似文献

外文文献
中文文献
专利

1. Ranker Enhancement for Proximity-Based Ranking of Biomedical Texts [J] . Rey-Long Liu, Yi-Chih Huang Journal of the American Society for Information Science and Technology . 2011,第12期

机译：基于邻近度的生物医学文本排名的增强
2. Fast First-Phase Candidate Generationfor Cascading Rankers [J] . Qi Wang, Constantinos Dimopoulos, Torsten Suel ACM SIGIR FORUM . 2016,第Jul17a21CD期

机译：快速第一阶段候选生成对于级联排名
3. Fast First-Phase Candidate Generationrnfor Cascading Rankers [J] . Qi Wang, Constantinos Dimopoulos, Torsten Suel ACM SIGIR FORUM . 2016,第JULa17a21CD期

机译：级联排名的快速第一阶段候选人生成
4. Learning a Fast Bipartite Ranker for Text Documents Using Lexicographical Rankers and ROC Curves [C] . Lucas de Souza Rodrigues, Edson Takashi Matsubara, Bruno Magalhães Nogueira IAPR International Conference on Document Analysis and Recognition . 2017

机译：使用词典分类等级和ROC曲线学习文本文档快速二等等级
5. Indexing Text Documents for Fast Evaluation of Regular Expressions [D] . Chen, Ting 2012

机译：索引文本文档以快速评估正则表达式
6. What is relevant in a text document?: An interpretable machine learning approach [O] . Leila Arras, Franziska Horn, Grégoire Montavon, -1

机译：文本文档中有什么相关内容？：一种可解释的机器学习方法
7. Learning to Select Rankers [O] . Niranjan Balasubramanian, James Allan 2011

机译：学习选择排名

Learning a Fast Bipartite Ranker for Text Documents using Lexicographical Rankers and ROC Curves

摘要

著录项

相似文献

相关主题

期刊订阅