首页> 外文会议>International Conference on String Processing and Information Retrieval >Cross-Comparison for Two-Dimensional Text Categorization
【24h】

Cross-Comparison for Two-Dimensional Text Categorization

机译:二维文本分类的交叉比较

获取原文

摘要

The organization of large text collections is the main goal of automated text categorization. In particular, the final aim is to classify documents into a certain number of pre-defined categories in an efficient way and with as much accuracy as possible. On-line and run-time services, such as personalization services and information filtering services, have increased the importance of effective and efficient document categorization techniques. In the last years, a wide range of supervised learning algorithms have been applied to this problem. Recently, a new approach that exploits a two-dimensional summarization of the data for text classification was presented. This method does not go through a selection of words phase; instead, it uses the whole dictionary to present data in intuitive way on two-dimensional graphs. Although, successful in terms of classification effectiveness and efficiency (as recently showed in [3]), this method presents some unsolved key issues: the design of the training algorithm seems to be ad hoc for the Reuters-21578 collection; the evaluation has only been done only on the 10 most frequent classes of the Reuters-21578 dataset; the evaluation lacks measure of significance in most parts; the method adopted lacks a mathematical justification. We focus on the first three aspects, leaving the fourth as the future work.
机译:大型文本集合的组织是自动文本分类的主要目标。特别是,最终目标是以有效的方式和尽可能多的精度将文档分类为一定数量的预定义类。在线和运行时服务,如个性化服务和信息过滤服务,增加了有效和高效的文档分类技术的重要性。在过去几年中,广泛的监督学习算法已经应用于这个问题。最近,提出了一种利用文本分类数据的二维摘要的新方法。该方法不通过各种单词阶段;相反,它使用整个字典以直观的方式在二维图上以直观方式呈现数据。虽然,在分类效果和效率方面取得了成功(如[3]),但这种方法提出了一些未解决的关键问题:培训算法的设计似乎是Reuters-21578集合的临时;评估只有在10次最常见的路透社-21578数据集上只完成;评价缺乏大多数零件的重要性;采用的方法缺乏数学理由。我们专注于前三个方面,将第四个作为未来的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号