【24h】

Scaling Up Text Classification for Large File Systems

机译:扩大大型文件系统的文本分类

获取原文

摘要

We combine the speed and scalability of infonnation retrieval with the generally superior classification accuracy offered by machine learning, yielding a two-phase text classifier that can scale to very large document corpora. We investigate the effect of different methods of formulating the query from the training set, as well as varying the query size. In empirical tests on the Reuters RCV1 corpus of 806,000 documents, we find runtime was easily reduced by a factor of 27x, with a somewhat surprising gain in F-measure compared with traditional text classification.
机译:我们将信息检索的速度和可扩展性与机器学习提供的通常更高的分类精度相结合,从而产生了两阶段的文本分类器,可以扩展到非常大的文档语料库。我们调查了从训练集中制定查询的不同方法以及更改查询大小的影响。在对806,000个文档的Reuters RCV1语料库进行的经验测试中,我们发现运行时间很容易减少了27倍,与传统的文本分类相比,F-measure的收益有些令人惊讶。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号