首页> 外文期刊>Soft computing: A fusion of foundations, methodologies and applications >Feature selection for entity extraction from multiple biomedical corpora: A PSO-based approach
【24h】

Feature selection for entity extraction from multiple biomedical corpora: A PSO-based approach

机译:来自多种生物医学的实体提取的特征选择:基于PSO的方法

获取原文
获取原文并翻译 | 示例
           

摘要

Entity extraction is an important step in biomedical text mining. Among many other challenges, there are two very crucial issues, viz . determining the most applicable feature set so that the model can be precise and less complex, and adapting the system across multiple benchmark corpora. In this paper, we propose a novel method for feature selection using the search capability of particle swarm optimization. The compact feature set used for training the classifier yields much better results when compared to the baseline model, which was developed with a complete set of features. A large number of features suitable for named entity recognition task from biomedical domain are also developed in the current paper. The complete set of features is implemented by studying the properties of datasets and from the domain knowledge. We have used conditional random field, a robust classifier as the underlying learning algorithm which has shown success in solving similar kinds of problems. Our experiments on multiple benchmark corpora yield the level of performance which are at par the state-of-the-art techniques.
机译:实体提取是生物医学文本挖掘的重要步骤。在许多其他挑战中,有两个非常关键的问题,viz。确定最适用的功能集,以便模型可以精确且更复杂,并在多个基准语料库中调整系统。在本文中,我们提出了一种使用粒子群优化的搜索能力的特征选择的新方法。与基线模型相比,用于训练分类器的紧凑功能集会产生更好的结果,这是通过一整套功能开发的基线模型。当前纸张也开发了适用于生物医学域的命名实体识别任务的大量功能。通过研究数据集的属性以及域知识来实现​​完整的特征。我们使用了条件随机字段,强大的分类器作为底层学习算法,它在解决类似的问题方面取得了成功。我们对多个基准语料的实验产生了处于最先进技术的性能水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号