首页> 外文会议>6th Workshop on health text mining and information analysis >Exploring Word Embedding for Drug Name Recognition
【24h】

Exploring Word Embedding for Drug Name Recognition

机译:探索词嵌入以实现药物名称识别

获取原文
获取原文并翻译 | 示例

摘要

This paper describes a machine learning-based approach that uses word embedding features to recognize drug names from biomedical texts. As a starting point, we developed a baseline system based on Conditional Random Field (CRF) trained with standard features used in current Named Entity Recognition (NER) systems. Then, the system was extended to incorporate new features, such as word vectors and word clusters generated by the Word2Vec tool and a lexicon feature from the DINTO ontology. We trained the Word2vec tool over two different corpus: Wikipedia and MedLine. Our main goal is to study the effectiveness of using word embeddings as features to improve performance on our baseline system, as well as to analyze whether the DINTO ontology could be a valuable complementary data source integrated in a machine learning NER system. To evaluate our approach and compare it with previous work, we conducted a series of experiments on the dataset of SemEval-2013 Task 9.1 Drug Name Recognition.
机译:本文介绍了一种基于机器学习的方法,该方法使用单词嵌入功能从生物医学文本中识别药物名称。首先,我们开发了基于条件随机场(CRF)的基线系统,该条件系统经过训练,并使用了当前命名实体识别(NER)系统中使用的标准功能。然后,该系统进行了扩展,以合并新功能,例如Word2Vec工具生成的单词向量和单词簇以及DINTO本体中的词典功能。我们在两个不同的语料库上训练了Word2vec工具:Wikipedia和MedLine。我们的主要目标是研究使用词嵌入作为提高基线系统性能的功能的有效性,并分析DINTO本体是否可以作为集成在机器学习NER系统中的有价值的补充数据源。为了评估我们的方法并将其与以前的工作进行比较,我们对SemEval-2013 Task 9.1药物名称识别的数据集进行了一系列实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号