Gene/protein name recognition based on support vector machine using dictionary as features

Tomohiro Mitsumori; Sevrani Fation; Masaki Murata; Kouichi Doi; Hirohumi Doi

首页> 外文期刊>BMC Bioinformatics >Gene/protein name recognition based on support vector machine using dictionary as features

【24h】

Gene/protein name recognition based on support vector machine using dictionary as features

机译：基于支持向量机的以字典为特征的基因/蛋白质名称识别

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. Results In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. Conclusion During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.

机译：背景技术由于已经发表了大量的生物医学文献，因此从生物医学文献中自动提取信息很重要。识别生物医学命名实体是信息提取的第一步。我们开发了基于SVM算法的自动识别系统，并在BioCreAtIvE的任务1.A中进行了评估，这是一项自动基因/蛋白质名称识别的竞赛。结果在本文介绍的工作中，我们的识别系统使用单词，词性（POS），拼字法，前缀，后缀和上一类的特征集。我们将这些功能称为“内部资源功能”，即可以在训练数据中找到的功能。此外，我们认为与字典匹配的功能是外部资源功能。我们调查并评估了这些功能的效果以及调整SVM算法参数的效果。我们发现字典匹配功能对f得分性能的改善有轻微的贡献。我们将其归因于字典匹配特征可能与当前多特征设置中的其他特征重叠的可能性。结论在SVM学习过程中，每个功能单独对系统性能都有轻微的积极影响。这支持了以下事实：支持向量机算法在特征向量空间的高维度上具有鲁棒性，并且意味着不需要特征选择。

著录项

来源
《BMC Bioinformatics》 |2005年第1期|共页
作者
Tomohiro Mitsumori; Sevrani Fation; Masaki Murata; Kouichi Doi; Hirohumi Doi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. ir-HSP: Improved Recognition of Heat Shock Proteins, Their Families and Sub-types Based On g-Spaced Di-peptide Features and Support Vector Machine [J] . Prabina K. Meher, Tanmaya K. Sahu, Shachi Gahoi, Frontiers in Genetics . 2017,第1期

机译：ir-HSP：基于 g 间隔二肽特征和支持向量机的热休克蛋白，其家族和亚型的改进识别
2. A novel fusion based on the evolutionary features for protein fold recognition using support vector machines [J] . Mohammad Saleh Refahi, A. Mir, Jalal A. Nasiri Scientific reports. . 2020,第1期

机译：基于使用支持向量机的蛋白质折叠识别进化特征的新型融合
3. Recognition of Mixture Control Chart Pattern Using Multiclass Support Vector Machine and Genetic Algorithm Based on Statistical and Shape Features [J] . Zhang Min, Cheng Wenming Mathematical Problems in Engineering . 2015,第PTa19期

机译：基于统计和形状特征的多类支持向量机和遗传算法的混合控制图模式识别
4. Protein-protein recognition prediction using support vector machine based on feature vectors [C] . Huang-Cheng Kuo, Ping-Lin Ong, Jung-Chang Lin, IEEE International Conference on Bioinformatics and Biomedicine Workshops . 2008

机译：基于特征向量的支持向量机蛋白质 - 蛋白质识别预测
5. Texture correlation feature for Support Vector Machine-based face detection [D] . Le, Nguyen. 2010

机译：基于支持向量机的人脸检测的纹理相关功能
6. Gene/protein name recognition based on support vector machine using dictionary as features [O] . Tomohiro Mitsumori, Sevrani Fation, Masaki Murata, 2005

机译：基于支持向量机的以字典为特征的基因/蛋白质名称识别
7. Gene/protein name recognition based on support vector machine using dictionary as features [O] . 2005

机译：基于支持向量机的以字典为特征的基因/蛋白质名称识别

Gene/protein name recognition based on support vector machine using dictionary as features

摘要

著录项

相似文献

相关主题

期刊订阅