首页> 外文会议>Document recognition and retrieval XVIII >Automated Identification of Biomedical Article Type Using Support Vector Machines
【24h】

Automated Identification of Biomedical Article Type Using Support Vector Machines

机译:使用支持向量机自动识别生物医学物品类型

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Authors of short papers such as letters or editorials often express complementary opinions, and sometimes contradictory ones, on related work in previously published articles. The MEDLINE? citations for such short papers are required to list bibliographic data on these "commented on" articles in a "CON" field. The challenge is to automatically identify the CON articles referred to by the author of the short paper (called "Comment-in" or CIN paper). Our approach is to use support vector machines (SVM) to first classify a paper as either a CIN or a regular full-length article (which is exempt from this requirement), and then to extract from the CIN paper the bibliographic data of the CON articles. A solution to the first part of the problem, identifying CIN articles, is addressed here. We implement and compare the performance of two types of SVM, one with a linear kernel function and the other with a radial basis kernel function (RBF). Input feature vectors for the SVMs are created by combining four types of features based on statistics of words in the article title, words that suggest the article type (letter, correspondence, editorial), size of body text, and cue phrases. Experiments conducted on a set of online biomedical articles show that the SVM with a linear kernel function yields a significantly lower false negative error rate than the one with an RBF. Our experiments also show that the SVM with a linear kernel function achieves a significantly higher level of accuracy, and lower false positive and false negative error rates by using input feature vectors created by combining all four types of features rather than any single type.
机译:诸如信件或社论这样的简短论文的作者通常对先前发表的文章中的相关工作发表补充意见,有时甚至是相互矛盾的意见。 MEDLINE?要求引用此类简短论文,以便在“ CON”字段中列出这些“已评论”文章的书目数据。面临的挑战是自动识别短论文(称为“ Comment-in”或CIN论文)作者所引用的CON文章。我们的方法是使用支持向量机(SVM)首先将论文分类为CIN或常规的全长文章(此要求除外),然后从CIN论文中提取CON的书目数据文章。此处解决了问题的第一部分,即识别CIN文章。我们实现并比较了两种类型的SVM的性能,一种具有线性核函数,另一种具有径向基核函数(RBF)。通过基于文章标题中的单词,建议文章类型的单词(字母,信函,社论),正文文本大小和提示短语的统计信息,组合四种类型的功能来创建SVM的输入特征向量。对一组在线生物医学文章进行的实验表明,具有线性核函数的SVM产生的假阴性错误率明显低于具有RBF的SVM。我们的实验还表明,使用线性核函数的SVM通过使用组合了所有四种类型的特征而不是任何一种类型的特征而创建的输入特征向量,可以显着提高准确性,并降低误报率和误报率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号