首页> 外文期刊>Journal of food, agriculture & environment >Identification of discriminative features for biological event extraction throughlinguistically informed feature selection
【24h】

Identification of discriminative features for biological event extraction throughlinguistically informed feature selection

机译:通过识别生物事件提取的辨别特征,通过创新的特征选择

获取原文
获取原文并翻译 | 示例
           

摘要

Machine learning classifiers have achieved significant performance in the area of biomedical event extraction. For example, support vector machine SVM) classifiers in the Turku Event Extraction System achieved the best performance in BioNLP09 task. Such classifiers typically rely on the use )f large feature sets. Despite their robust performance, however, recent research has suggested that feature sets produced through automatic training need to be further optimized through size reduction in order toimprove system performance. The current paper attempts to identify ways to reduce the size of feature sets by investigating the contribution of four different feature sets constructed according to lexical, grammatical, syntactic and semantic information. It reports an experiment based on BioNLP data prepared by the Turku team for biological event extraction and examines to what extent the dimension of the feature sets can be reduced while the classifier can still achieve similar performance. The importance of each feature set is evaluated through a SVM classifier. Our experiments demonstrate that feature set construction according to lexical, grammatical and syntactic J information can effectively reduce the set size by as much as 86% while maintaining a comparable performance, hence significantly resolving the feature dimension issue. It is also shown through our experiments that a hybrid feature set constructed according to a combination of lexical and semantic information can achieve the second highest accuracy, hence indicating the useful feasibility of constructing an optimal feature set through dimension reduction and feature combination. We conclude that the experiments reported in the current paper have produced empirical evidence supportingthe importance of linguistic information for the construction of high-performance feature sets in addition to domain knowledge for the task of biomedical event extraction.
机译:机器学习分类器在生物医学事件提取领域取得了显着性能。例如,Turku事件提取系统中的支持向量机SVM)分类器在BiONLP09任务中实现了最佳性能。这种分类器通常依赖于使用)F大功能集。然而,尽管他们的性能强劲,但最近的研究表明,通过自动培训生产的功能集需要通过尺寸减少来进一步优化,以便进行系统性能。目前的纸张试图通过调查根据词汇,语法,句法和语义信息构建的四种不同特征集的贡献来识别减少特征集大小的方法。它报告了基于由土库普团队为生物事件提取制备的BionlP数据的实验,并在分类器仍然可以实现类似的性能的同时可以减少特征集的维度的程度。通过SVM分类器评估每个功能集的重要性。我们的实验表明,根据词法,语法和句法J信息的特征设定结构可以在保持相当的性能的同时有效地将设定大小减少到86%,因此显着解析了特征维度问题。还通过我们的实验示出了根据词汇和语义信息的组合构造的混合特征组可以实现第二最高精度,因此指示通过尺寸减小和特征组合构造最佳特征的有用可行性。我们得出结论,本文报告的实验已经产生了支持在域名知识外,支持对建设高性能特征的语言信息的重要性,以及生物医学事件提取任务的域名知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号