首页> 外文会议>Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining >Comparison of Documents Classification Techniques to Classify Medical Reports
【24h】

Comparison of Documents Classification Techniques to Classify Medical Reports

机译:文件分类技术的比较分类医学报告

获取原文

摘要

This paper addresses a real world problem: the classification of text documents in the medical domain. There are a number of approaches to classifying text documents. Here, we use a partially supervised classification approach and argue that it is effective and computationally efficient for real-world problems. The approach uses a two-step strategy to cut down on the effort required to label each document for classification. Only a small set of positive documents are labeled initially, with others being labeled automatically as a result of the first step. The second step builds the actual text classifier. There are a number of methods that have been proposed for each step. A comprehensive evaluation of various combinations of methods is conducted to compare their performances using real world medical documents. The results show that using EM based methods to build the classifier yields better results than SVM. We also experimentally show that careful selection of a subset of features to represent the documents can improve the performance of the classifiers.
机译:本文涉及一个现实世界问题:医学领域文本文档的分类。分类文本文件有许多方法。在这里,我们使用部分监督的分类方法,并争辩说它对现实世界的问题有效和计算效率。该方法使用两步策略来减少标记每个文档进行分类所需的努力。只有一小组正面文件都标记为最初标记,其他人因第一步而自动标记。第二步构建实际的文本分类器。每个步骤都提出了许多方法。进行了对各种方法组合的全面评估,以使用现实世界医疗文件进行比较他们的表演。结果表明,使用基于EM的构建方法的构建方法产生比SVM更好的结果。我们还通过实验表明,仔细选择要代表文档的特征子集可以提高分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号