首页> 外文期刊>Journal of food, agriculture & environment >Identification of discriminative features for biological event extraction through linguistically informed feature selection
【24h】

Identification of discriminative features for biological event extraction through linguistically informed feature selection

机译:通过语言告知的特征选择识别用于生物事件提取的区分特征

获取原文
获取原文并翻译 | 示例
           

摘要

Machine learning classifiers have achieved significant performance in the area of biomedical event extraction. For example, support vector machine (SVM) classifiers in the Turku Event Extraction System achieved the best performance in BioNLP09 task. Such classifiers typically rely on the use of large feature sets. Despite their robust performance, however, recent research has suggested that feature sets produced through automatic training need to be further optimized through size reduction in order to improve system performance. The current paper attempts to identify ways to reduce the size of feature sets by investigating the contribution of four different feature sets constructed according to lexical, grammatical, syntactic and semantic information. It reports an experiment based on BioNLP data prepared by the Turku team for biological event extraction and examines to what extent the dimension of the feature sets can be reduced while the classifier can still achieve similar performance. The importance of each feature set is evaluated through a SVM classifier. Our experiments demonstrate that feature set construction according to lexical, grammatical and syntactic information can effectively reduce the set size by as much as 86% while maintaining a comparable performance, hence significantly resolving the feature dimension issue. It is also shown through our experiments that a hybrid feature set constructed according to a combination of lexical and semantic information can achieve the second highest accuracy, hence indicating the useful feasibility of constructing an optimal feature set through dimension reduction and feature combination. We conclude that the experiments reported in the current paper have produced empirical evidence supporting the importance of linguistic information for the construction of high-performance feature sets in addition to domain knowledge for the task of biomedical event extraction.
机译:机器学习分类器在生物医学事件提取领域取得了显着的性能。例如,图尔库事件提取系统中的支持向量机(SVM)分类器在BioNLP09任务中获得了最佳性能。这样的分类器通常依赖于大型功能集的使用。尽管它们具有强大的性能,但是最近的研究表明,通过自动训练生成的功能集需要通过减小尺寸来进一步优化,以提高系统性能。本文试图通过调查根据词汇,语法,句法和语义信息构造的四个不同特征集的贡献,来确定减小特征集大小的方法。它报告了基于Turku团队准备的BioNLP数据进行生物事件提取的实验,并检查了特征集的维数可以减少到什么程度,而分类器仍然可以达到类似的性能。每个功能集的重要性通过SVM分类器进行评估。我们的实验表明,根据词汇,语法和句法信息构造的特征集可以有效地将集合大小减少多达86%,同时保持可比的性能,从而显着解决了特征维问题。通过我们的实验还表明,根据词汇和语义信息的组合构建的混合特征集可以达到第二高的准确性,因此表明了通过降维和特征组合来构建最佳特征集的有用可行性。我们得出的结论是,本论文中报道的实验已经产生了经验证据,证明语言信息对于构建高性能功能集以及生物医学事件提取任务的领域知识的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号