首页> 外文会议>International work-conference on bioinformatics and biomedical engineering >Comparative Study of Feature Selection Methods for Medical Full Text Classification
【24h】

Comparative Study of Feature Selection Methods for Medical Full Text Classification

机译:医学全文分类中特征选择方法的比较研究

获取原文

摘要

There is a lot of work in text categorization using only the title and abstract of the papers. However, in a full paper there is a much larger amount of information that could be used to improve the text classification performance. The potential benefits of using full texts come with an additional problem: the increased size of the data sets. To overcome the increased the size of full text data sets we performed an assessment study on the use of feature selection methods for full text classification. We have compared two existing feature selection methods (Information Gain and Correlation) and a novel method called k-Best-Discriminative-Terms. The assessment was conducted using the Ohsumed corpora. We have made two sets of experiments: using title and abstract only; and full text. The results achieved by the novel method show that the novel method does not perform well in small amounts of text like title and abstract but performs much better for the full text data sets and requires a much smaller number of attributes.
机译:仅使用论文的标题和摘要进行文本分类的工作很多。但是,在全文中,有大量信息可用于改善文本分类性能。使用全文的潜在好处还带来另一个问题:数据集的大小增加。为了克服全文数据集的增加,我们对使用特征选择方法进行全文分类进行了评估研究。我们比较了两种现有的特征选择方法(信息增益和相关性)和一种称为k-最佳区分项的新颖方法。评估是使用Ohsumed语料库进行的。我们进行了两组实验:仅使用标题和摘要;仅使用标题和摘要。和全文。通过该新方法获得的结果表明,该新方法在少量文本(如标题和摘要)中效果不佳,但在全文数据集上的性能要好得多,并且需要的属性数量要少得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号