首页> 外文会议>International Conference on Database Systems for Advanced Applications >Fast text classification: a training-corpus pruning based approach
【24h】

Fast text classification: a training-corpus pruning based approach

机译:快速文本分类:基于培训 - 基于训练的方法

获取原文

摘要

With the rapid growth of on-line information available, text classification is becoming more and more important. kNN is a widely used text classification method of high performance. However; this method is inefficient because it requires a large amount of computation for evaluating the similarity between a test document and each training document. In this paper, we propose a fast kNN text classification approach based on pruning the training corpus. By using this approach, the size of training corpus can be condensed sharply so that time-consuming on kNN searching can be cut off significantly, and consequently classification efficiency can be improved substantially while classification performance is preserved comparable to that of without pruning. Effective algorithm for text corpus pruning is designed. Experiments over the Reuters corpus are carried out, which validate the practicability of the proposed approach. Our approach is especially suitable for on-line text classification applications.
机译:随着可用的在线信息的快速增长,文本分类变得越来越重要。 KNN是一种广泛使用的高性能的文本分类方法。然而;该方法效率低下,因为它需要大量计算来评估测试文档和每个训练文件之间的相似性。在本文中,我们提出了一种基于修剪训练语料库的快速KNN文本分类方法。通过使用这种方法,培训语料库的大小可以急剧地凝结,因此可以显着切断KNN搜索的耗时,因此可以显着地改善分类效率,同时保留分类性能与不修剪的分类性能相当。设计了文本语料库修剪的有效算法。对路透社语料库的实验进行,验证了所提出的方法的实用性。我们的方法特别适用于在线文本分类应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号