...
首页> 外文期刊>BMC Bioinformatics >Gene/protein name recognition based on support vector machine using dictionary as features
【24h】

Gene/protein name recognition based on support vector machine using dictionary as features

机译:基于支持向量机的以字典为特征的基因/蛋白质名称识别

获取原文
           

摘要

Background Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. Results In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. Conclusion During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.
机译:背景技术由于已经发表了大量的生物医学文献,因此从生物医学文献中自动提取信息很重要。识别生物医学命名实体是信息提取的第一步。我们开发了基于SVM算法的自动识别系统,并在BioCreAtIvE的任务1.A中进行了评估,这是一项自动基因/蛋白质名称识别的竞赛。结果在本文介绍的工作中,我们的识别系统使用单词,词性(POS),拼字法,前缀,后缀和上一类的特征集。我们将这些功能称为“内部资源功能”,即可以在训练数据中找到的功能。此外,我们认为与字典匹配的功能是外部资源功能。我们调查并评估了这些功能的效果以及调整SVM算法参数的效果。我们发现字典匹配功能对f得分性能的改善有轻微的贡献。我们将其归因于字典匹配特征可能与当前多特征设置中的其他特征重叠的可能性。结论在SVM学习过程中,每个功能单独对系统性能都有轻微的积极影响。这支持了以下事实:支持向量机算法在特征向量空间的高维度上具有鲁棒性,并且意味着不需要特征选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号