首页> 外文会议>Advances in Artificial Intelligence >Recognizing Biomedical Named Entities in Chinese Research Abstracts
【24h】

Recognizing Biomedical Named Entities in Chinese Research Abstracts

机译:在中文研究摘要中识别生物医学命名实体

获取原文
获取原文并翻译 | 示例

摘要

Most research on biomedical named entity recognition has focused on English texts, e.g., MEDLINE abstracts. However, recent years have also seen significant growth of biomedical publications in other languages. For example, the Chinese Biomedical Bibliographic Database has collected over 3 million articles published after 1978 from 1600 Chinese biomedical journals. We present here a Conditional Random Field (CRF) based system for recognizing biomedical named entities in Chinese texts. Viewing Chinese sentences as sequences of characters, we trained and tested the CRF model using a manually annotated corpus containing 106 research abstracts (481 sentences in total). The features we used for the CRF model include word segmentation tags provided by a segmenter trained on newswire corpora, and lists of frequent characters gathered from training data and external resources. Randomly selecting 400 sentences for training and the rest for testing, our system obtained an 68.60% F-score on average, significantly outperforming the baseline system (F-score 60.54% using a simple dictionary match). This suggests that statistical approaches such as CRFs based on annotated corpora hold promise for the biomedical NER task in Chinese texts.
机译:大多数关于生物医学命名实体识别的研究都集中在英文文本上,例如MEDLINE摘要。然而,近年来也看到其他语言的生物医学出版物的显着增长。例如,中国生物医学书目数据库已收集了1978年后从1600种中国生物医学期刊中发表的300万篇文章。我们在这里提出一个基于条件随机场(CRF)的系统,用于识别中文文本中的生物医学命名实体。通过将中文句子视为字符序列,我们使用人工注释的语料库训练和测试了CRF模型,该语料库包含106个研究摘要(总共481个句子)。我们用于CRF模型的功能包括由新闻专集上训练的分段器提供的分词标签,以及从训练数据和外部资源中收集的常用字符列表。随机选择400个句子进行训练,其余部分进行测试,我们的系统平均获得68.60%的F分数,明显优于基线系统(使用简单的字典匹配,F分数为60.54%)。这表明诸如基于注解语料库的CRFs之类的统计方法有望为中文文本的生物医学NER任务带来希望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号