首页> 外文会议>International conference on bioinformatics and computational biology >Document Classification: A Novel Approach Based on SVM
【24h】

Document Classification: A Novel Approach Based on SVM

机译:文献分类:一种基于支持向量机的新方法

获取原文

摘要

Science and Engineering fields are very large and important areas of research in which the scientists are conducting studies and experiments, as a result a voluminous datasets are generated from such research. An example is the biological databases like the one we use in this paper, the PubMed database, it contains millions of articles. In this paper we discuss the main challenge for these types of databases which is the problem of retrieving information that are relevant to a specific topic from a huge collection of articles. In other words are classifying those articles. In this paper we present a generic algorithm that can classify biomedical articles efficiently related to Minimotifs. Our algorithms are based on algorithm that has been used for a related problem called SVM. We are comparing our algorithm with the Gene selection algorithm that uses the SVM in other ways. The Gene selection is the problem of identifying a minimum set of genes that are responsible for certain events (for example the presence of cancer). Our proposed algorithm will take as input a set of articles (that characterize the information of interest) and will develop a learner model that will identify a small subset of the keywords that are capable of classifying papers into two types. The first type is the articles that have information of interest the second type are the articles that don't have information of interest. Experiments show that our new algorithm give a higher classification accuracy using a smaller number of selected keywords when compared to one of the best algorithms reported in the literature (Gene Selection).
机译:科学与工程领域是非常大且重要的研究领域,科学家们正在其中进行研究和实验,结果是从此类研究中产生了大量的数据集。一个例子就是生物学数据库,例如我们在本文中使用的生物学数据库PubMed数据库,其中包含数以百万计的文章。在本文中,我们讨论了这类数据库的主要挑战,即从大量文章中检索与特定主题相关的信息的问题。换句话说就是对那些文章进行分类。在本文中,我们提出了一种通用算法,可以有效地分类与Minimotifs相关的生物医学文章。我们的算法基于已用于称为SVM的相关问题的算法。我们正在将我们的算法与以其他方式使用SVM的基因选择算法进行比较。基因选择是确定负责某些事件(例如癌症的存在)的最小基因集的问题。我们提出的算法将输入一组文章(表征感兴趣的信息)作为输入,并将开发一个学习者模型,该模型将识别能够将论文分为两类的一小部分关键字。第一种是具有感兴趣信息的文章,第二种是没有感兴趣信息的文章。实验表明,与文献(“基因选择”)中报道的最佳算法之一相比,我们的新算法使用较少数量的选定关键字即可提供更高的分类准确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号